Primary key for cassandra in basic auth plugin

I see that in basic-auth plugin, the primary key is id and there is an index on username. While this may work well with postgres, but in case of Cassandra, secondary indexes are costly. Why isn’t username a primary/partition key when its also a unique field?

Asking this because lookups in basic auth plugin are on username. // @hbagdi @thibaultcha

I would like to share results of a few performance runs that we did, and why I suspect the lookups in basic auth plugin to be inefficient.

Test Setup

  • 3 kong nodes, running on c5.2xlarge, backed by cassandra, with cassandra_consistency as ONE
  • 5 node cassandra cluster on c5.xlarge.
  • A dummy upstream service with basic auth plugin enabled.
  • 1 million consumers and 1 million credentials pre populated, with 1 credential per consumer.
  • Test runs made calls to the endpoint with random users/credentials within this range.

Test Scenarios

  1. Scenario 1: Set of runs without cache warmup.
  2. Scenario 2: Set of runs after basicauth_credentials were warmed up in the cache, but not consumers
  3. Scenario 3: Set of runs after consumers were warmed up in the cache, but not basicauth_credentials

Observations

  • Scenario 1 had the worst performance, with kong proxy latency in seconds. We’ll keep it out of the discussion here.
  • Scenario 2 showed an initial spike in kong proxy latency (p99 around 100ms) for around 5 minutes, after which it stabilized to ~5ms. The test gave ~6K rps.
  • Scenario 3 was run with same load as that of scenario 2, but kong proxy latency was consistently high (p99 going up to 1s) and didn’t stabilize during the duration of the run.
  • During run 3, cassandra nodes showed high CPU usage (~90%), high network utilisation and high number of threadpool operations (with pending ops touching ~5K).
  • The above parameters were under limits in scenario 2.
  • Also, some spike was seen in cassandra metrics when cache was warming up for basicauth_credentials. It took ~3 minutes for cache to be populated in each run.

Evidence

Attached are some graphs for refernce. Scenario 2 ran from ~11:40 till ~11:55, while scenario 2 started at ~12:57

Screenshot 2020-05-07 at 6.44.18 PM

The performance tests were again run after forking basic-auth plugin and making the above mentioned changes. The performance bottleneck completely went away and the plugin performed flawlessly with over 1 million non-cached consumers and credentials. p99 proxy latency was observed to be around 12ms till the cache got warmed up.
A proposal to update the basic-auth plugin with the suggested changes is made here : https://github.com/Kong/kong/pull/5914