Question on DB Configuration setup Kong v 0.14 + C*?

I think that in your case, because you are deploying multiple datacenters, your cluster size is 3, not 6 (3 nodes in each DC, each with their own RF). It doesn’t change much regarding C/A (consistency/availability), but it does mean that each of your nodes holds more than just 33% of your data.

It seems to me like you have two problems here:

  1. The Oauth2 plugin does a read-upon-write and due to the eventual consistency nature of Cassandra, and the LB policy in effect, there are occasional failures in the plugin.
  2. Increasing the consistency setting fixes 1., but because of your RF setting, means that you cannot survive the loss a node anymore.

I see two options:

  1. Using a consistency of ONE and the request-aware LB policy should allow you to keep an RF of 2, survive the loss of a node, and ensures that subsequent reads from an insert are done on the same node, thus avoiding potential consistency issues -> not true, see below.
    However, the request-aware LB policy does not guarantee that the same node will systematically be used in subsequent queries: if the node becomes unreachable between 2 queries, the policy falls back to another node, in a round-robin fashion. Even with an RF of 2, the other node might not have received the token from the C* gossip yet.
  2. Increasing the consistency to LOCAL_QUORUM, but also increasing the RF to 3 in order to be able to survive the loss of a node. To survive the loss of more than one node, you’d need a larger cluster.

In the context of Kong, which isn’t a very write-oriented application, and considering your clusters are relatively small, I think setting your RF to 3 would be fine. Of course, you know better the size of your dataset, which is comprised of all your entities (consumers, oauth2_tokens, rate-limiting rows, etc…) and the available storage on your nodes.

That said, a consistency of ONE is slightly more performant, and isn’t as disruptive to your current deployment. The likeliness of a node becoming unreachable between the insert and the write operation (plus the gossiping not yet being propagated) is small, but should be assessed as a risk.

1 Like