Benchmarking between kong with cassandra and postgres

Hi all,

I recently did a benchmarking on kong and OAuth 2.0 plugin, and I found something bizarre to me.

Here is what I did:

I have 3 n1-highmem-4 instances (4 vCPUs and 26GB mem) on Google Cloud Platform, which is managed by Google Kubernetes Engine, and then I came out with the following setup:

  1. Start 1 or 2 Cassandra nodes on each instance and start kong on the third instance. 2 nodes Cassandra cluster with configuration cassandra_repl_strategy=SimpleStrategy and cassandra_repl_factor=2.
  2. Start 1 PostgreSQL server on one instance and start kong on another instance, the third instance is idle.

After kong is set up, I use distributed Jmeter (1 master with 3 slaves) to execute the load test on kong’s /oauth2/token endpoint to test the token generation which is mostly DB writes.

However, I have several questions in terms of the result I got:

  1. 1 Cassandra node vs 1 PostgreSQL server, PostgreSQL’s throughput is almost doubled than Cassandra. Is it normal? I would expect Cassandra can outperform PostgreSQL.
  2. 2 Cassandra node vs 1 PostgreSQL server, PostgreSQL’s throughput is still 50% better than Cassandra. The throughput of 2 Cassandra node is increased but definitely not linear.

Not sure if anyone else did a similar comparison, in general, Cassandra cost more resources but didn’t provide the performance I expected. I tried to tune some parameters on Cassandra but not much gained. Any idea?

While I’m not an expert with Cassandra, this relates to the replication and consistency setting you have for your Cassandra cluster. Since the replication factor is set to two and assuming the cassandra consistency is set to ONE, meaning each write will have to be always persisted on one other replication node.
It makes sense that Postgres outperforms in this case as it is not replicating writes.

It would be hard to get Cassandra to outperform at this scale.

Hi,

Thank you very much for your reply. Then I suppose I need more nodes in order to get Cassandra to outperform PostgreSQL.

That alone won’t help you. You will need to tune consistency and replication settings, at the very least.

Hi,

Thank you for the tip. AFAIK, the consistency level ONE should provide the highest availability as it only requires one replica node to commit the writes. For replication settings, I set cassandra_repl_factor=2 for 2 nodes, which means the records will be duplicated once and exist on both nodes.

Then I have a hunch that if I change the replication factor to 1, it should be faster as it doesn’t require the records to be duplicated across the cluster. But once again it didn’t go with my hunch, the performance actually dropped after I change the replication factor to 1.

I also found this issue https://github.com/Kong/kong/issues/543 which indicated that replication factor = 1 will affect the reading performance as the data only exists on one of the nodes in the cluster, but I don’t quite understand why it makes the write performance worse. @hbagdi Do you have any suggestion on it?

So far I feel Cassandra is resource hungry and a little bit unpredictable, and even with days of efforts still can’t compare with PostgreSQL out-of-box performance. I’m not saying it’s not good, on the contrary, it’s been proven that it’s highly efficient and scalable, maybe the word I want to choose is “not newbie-friendly” :sweat_smile: