I see that in basic-auth
plugin, the primary key is id
and there is an index on username
. While this may work well with postgres, but in case of Cassandra, secondary indexes are costly. Why isn’t username
a primary/partition key when its also a unique
field?
I would like to share results of a few performance runs that we did, and why I suspect the lookups in basic auth plugin to be inefficient.
Test Setup
- 3 kong nodes, running on c5.2xlarge, backed by cassandra, with
cassandra_consistency
asONE
- 5 node cassandra cluster on
c5.xlarge
. - A dummy upstream service with basic auth plugin enabled.
- 1 million consumers and 1 million credentials pre populated, with 1 credential per consumer.
- Test runs made calls to the endpoint with random users/credentials within this range.
Test Scenarios
- Scenario 1: Set of runs without cache warmup.
-
Scenario 2: Set of runs after
basicauth_credentials
were warmed up in the cache, but notconsumers
-
Scenario 3: Set of runs after
consumers
were warmed up in the cache, but notbasicauth_credentials
Observations
- Scenario 1 had the worst performance, with kong proxy latency in seconds. We’ll keep it out of the discussion here.
- Scenario 2 showed an initial spike in kong proxy latency (p99 around 100ms) for around 5 minutes, after which it stabilized to ~5ms. The test gave ~6K rps.
- Scenario 3 was run with same load as that of scenario 2, but kong proxy latency was consistently high (p99 going up to 1s) and didn’t stabilize during the duration of the run.
- During run 3, cassandra nodes showed high CPU usage (~90%), high network utilisation and high number of threadpool operations (with pending ops touching ~5K).
- The above parameters were under limits in scenario 2.
- Also, some spike was seen in cassandra metrics when cache was warming up for
basicauth_credentials
. It took ~3 minutes for cache to be populated in each run.
Evidence
Attached are some graphs for refernce. Scenario 2 ran from ~11:40 till ~11:55, while scenario 2 started at ~12:57
The performance tests were again run after forking basic-auth
plugin and making the above mentioned changes. The performance bottleneck completely went away and the plugin performed flawlessly with over 1 million non-cached consumers and credentials. p99 proxy latency was observed to be around 12ms till the cache got warmed up.
A proposal to update the basic-auth
plugin with the suggested changes is made here : https://github.com/Kong/kong/pull/5914