Question on DB Configuration setup Kong v 0.14 + C*?

For Context 2 DC 3 C* nodes in each DC with a respective Kong node in each DC as well,

In the past we ran

KONG_CASSANDRA_CONSISTENCY: ONE
KONG_CASSANDRA_LB_POLICY: DCAwareRoundRobin

I found problems with that mentioned here around C* behavior of eventual consistency:

Now we have a new LB Policy Kong has added to the fray: RequestDCAwareRoundRobin which to the best of my knowledge means that a tx as its being processed sticks to the same C* node throughout its execution.

We have lately been running:

KONG_CASSANDRA_CONSISTENCY: LOCAL_QUORUM
KONG_CASSANDRA_LB_POLICY: RequestDCAwareRoundRobin

This fixes all related errors around consistency it seems, but introduced us to a problem today where 1 of the 3 C* nodes went down in a DC (Thanks to chef-client automation gone awry and spawning multiple instances till our VM died) and Kong in one of our data centers began having downtime with errors such as:

2018/08/07 13:23:40 [error] 31#0: *1888047 [lua] responses.lua:121: send(): [cassandra error] [Unavailable exception] Cannot achieve consistency level LOCAL_QUORUM

This is not ideal that 1 node lost can take down a DC, so I would like to revisit this:

Would it make sense the optimal Kong Cassandra config could now be achieved as?:

KONG_CASSANDRA_CONSISTENCY: ONE
KONG_CASSANDRA_LB_POLICY: RequestDCAwareRoundRobin

Which gives the benefit of being able to lose 2 nodes in a DC for Kong by switching back to ONE AND RequestDCAwareRoundRobin in theory should cause eventual consistency problem to go away since a single tx will route through the same node for say an OAuth2 transaction?

Thanks!
-Jeremy

1 Like

Tests from

KONG_CASSANDRA_CONSISTENCY: ONE
KONG_CASSANDRA_LB_POLICY: RequestDCAwareRoundRobin

Did not show any improvement over failure rate of older pattern(
KONG_CASSANDRA_CONSISTENCY: ONE
KONG_CASSANDRA_LB_POLICY: DCAwareRoundRobin)

Was really hoping that was going to be the fix, I may have to go back to the patchwork code I wrote into the OAuth plugin from my github issue that does retry logic before failing out:

The guy in the post brought up a point about returning the data from the create as opposed to doing a find on C*, this would probably solve the errors BUT if its not actually in the db then if you are clustered well then other Kong nodes will not have that OAuth token on subsequent calls and client will get a 401 rejected. Could be a possibility though(As I am finding all the tokens actually do make it #eventualConsistency)…

Definitely would prefer a ONE setting which allows loss of nodes with some OAuth2 retry logic vs 100% stability in say token generation generation with LOCAL_QUORUM but downside being no resiliency to C* downtime.

Same error for reference:

2018/08/09 16:44:43 [error] 37#0: *42506 [lua] responses.lua:121: access(): /usr/local/share/lua/5.1/kong/plugins/oauth2/access.lua:74: attempt to index local 'token' (a nil value), client: 10.x.x.x, server: kong, request: "POST /auth/oauth2/token HTTP/1.1", host: "gateway.company.com"
2018/08/09 16:44:44 [error] 36#0: *43111 [lua] handler.lua:181: [events] missing entity in crud subscriber, client: 10.x.x.x, server: kong, request: "POST /auth/oauth2/token HTTP/1.1", host: "gateway.company.com"
2018/08/09 16:44:44 [error] 36#0: *43111 lua coroutine: runtime error: /usr/local/share/lua/5.1/kong/plugins/oauth2/access.lua:74: attempt to index local 'token' (a nil value)
stack traceback:
coroutine 0:
	/usr/local/share/lua/5.1/kong/plugins/oauth2/access.lua: in function 'generate_token'
	/usr/local/share/lua/5.1/kong/plugins/oauth2/access.lua:355: in function 'execute'
	/usr/local/share/lua/5.1/kong/plugins/oauth2/handler.lua:12: in function </usr/local/share/lua/5.1/kong/plugins/oauth2/handler.lua:10>
coroutine 1:
	[C]: in function 'resume'
	coroutine.wrap:21: in function <coroutine.wrap:21>
	/usr/local/share/lua/5.1/kong/init.lua:468: in function 'access'
access_by_lua(nginx.conf:117):2: in function <access_by_lua(nginx.conf:117):1>, client: 10.x.x.x, server: kong, request: "POST /auth/oauth2/token HTTP/1.1", host: "gateway.company.com"

Another factor that plays in the availability/consistency equation of Cassandra is the configured replication factor.

Since you are using a multi-DC cluster, I suspect that you are using the NetworkTopologyStrategy, and that you specified a number of replicas per datacenter (likely the same). What is this number? If you depend on Kong to create the cluster for you, it is the number you find in the cassandra_data_centers setting. If not, you must have specified it while executing your query CREATE KEYSPACE. In any case, you can also find out by running describe keyspace <kong keyspace> via cqlsh.

If you play with this tool:

https://www.ecyrd.com/cassandracalculator/

You will notice that the replication factor can have a non-negligible impact on the availability of your cluster.

Hope that helps. If not, please keep us updated.


PS: on another note, this error:

kong/plugins/oauth2/access.lua:74: attempt to index local 'token' (a nil value)

Really shouldn’t happen and is considered a bug nonetheless (Lua-thrown error). We would most certainly accept a contribution to fix it, and return a more appropriate error/status code to the client (500 never is an appropriate status code, as you might expect) :slight_smile:

@thibaultcha thanks for the reply,

I did look over that tool and we went with
kong_env | True | {‘DC1’: ‘2’, 'DC2: ‘2’, ‘class’: ‘org.apache.cassandra.locator.NetworkTopologyStrategy’}

Edit - You are right this would be me, forgot to think about it on a DC basis.

Are their standards Kong recommends around replication factor?

I think that in your case, because you are deploying multiple datacenters, your cluster size is 3, not 6 (3 nodes in each DC, each with their own RF). It doesn’t change much regarding C/A (consistency/availability), but it does mean that each of your nodes holds more than just 33% of your data.

It seems to me like you have two problems here:

  1. The Oauth2 plugin does a read-upon-write and due to the eventual consistency nature of Cassandra, and the LB policy in effect, there are occasional failures in the plugin.
  2. Increasing the consistency setting fixes 1., but because of your RF setting, means that you cannot survive the loss a node anymore.

I see two options:

  1. Using a consistency of ONE and the request-aware LB policy should allow you to keep an RF of 2, survive the loss of a node, and ensures that subsequent reads from an insert are done on the same node, thus avoiding potential consistency issues -> not true, see below.
    However, the request-aware LB policy does not guarantee that the same node will systematically be used in subsequent queries: if the node becomes unreachable between 2 queries, the policy falls back to another node, in a round-robin fashion. Even with an RF of 2, the other node might not have received the token from the C* gossip yet.
  2. Increasing the consistency to LOCAL_QUORUM, but also increasing the RF to 3 in order to be able to survive the loss of a node. To survive the loss of more than one node, you’d need a larger cluster.

In the context of Kong, which isn’t a very write-oriented application, and considering your clusters are relatively small, I think setting your RF to 3 would be fine. Of course, you know better the size of your dataset, which is comprised of all your entities (consumers, oauth2_tokens, rate-limiting rows, etc…) and the available storage on your nodes.

That said, a consistency of ONE is slightly more performant, and isn’t as disruptive to your current deployment. The likeliness of a node becoming unreachable between the insert and the write operation (plus the gossiping not yet being propagated) is small, but should be assessed as a risk.

1 Like

@thibaultcha

Agree on all points here but have some concern with one snippit, specifically your first point:

Using a consistency of ONE, and the request-aware LB policy should allow you to keep an RF of 2, survive the loss of a node, and ensures that subsequent reads from an insert are done on the same node, thus avoiding consistency potential issues. However, the request-aware LB policy does not guarantee that the same node will systematically be used in subsequent queries: if the node becomes unreachable between 2 queries, the policy falls back to another node, in a round-robin fashion. Even with an RF of 2, the other node might not have received the token from the C* gossip yet.

When testing this policy today, we do not seem to see this behavior, our C* nodes do not seem unreachable or any JMX alerts thrown between subsequent queries(such as read/write timeout or unavailable exceptions) yet the failures still persist, these C* nodes have adequate CPU / RAM each (2 CPU / 8GB RAM + 30GB /data storage exclusive for kong C*) and currently take little to no query traffic in any of our testing environments. I will spend some time running in debug to see if I can capture more logs to prove that this behavior may need review. I think for now we will go probably with the 2nd option as the ball is in our court to be able to make that change asap, debugging point 1 and proving that the code or somehow C* truly is having a problem on say a subsequent read upon write query(and not throwing an alert/log of that?) will be a much more intensive endeavor.

Really appreciate the time spent discussing, glad your thoughts are least were in line with where I was going with this. In general replicating data on all nodes is not a huge concern IMO because we are talking sub 5k proxies / sub 1-2k consumers max over the long haul probably, Kong has a relatively small DB data footprint.

Thanks!
-Jeremy

Giving some more thought to this, I made a wrong assumption in the first proposal. The issue is that the load balancing policy is round-robin, and not hint-based. Basically, the chosen host is not necessarily the host holding the data, and thus may act as a coordinator: forwarding the query to one of the appropriate replica nodes for this given partition key (in your case, one of two nodes since your RF is 2). So using the request-aware LB policy does not mean that the value will necessarily be read from the same node in subsequent queries: the coordinator will be the same, but it will itself round-robin across replicas, and the second replica might not have received the value from the gossiping yet.

After all, the request-aware LB policy really is meant to avoid spurious new connections to be opened during the processing of a given request, and take advantage of the quasi-guarantee that the first opened connection will still be alive a few ms later, while the same request is still being processed. This usually helps with reducing P99 latency.

1 Like

Ah bingo, that makes more sense, so really to be 100% fault tolerant with current Kong OAuth Impl we really do need to employ option 2(Assuming no C* issues). I think that clarifies it for me :slight_smile: . Maybe I can spend some time and think up a good way to get this verbage into the official Kong documentation so others don’t run into this issue! Although hopefully its good to think about in general, certainly helped clarify things for me, Cassandra definitely adds complexity for all its benefits :slight_smile: .

Did try out the RF 3 and LOCAL_Q today, saw a few logs around it here that occured just once during 3-4 test waves of 100+ tps against the token creation :

2018/08/10 18:26:21 [error] 32#0: *6883 [lua] handler.lua:218: [events] could not broadcast crud event: nil, client: 10.x.x.x, server: kong, request: "POST /auth/oauth2/token HTTP/1.1", host: "gateway.company.com"
2018/08/10 18:26:21 [error] 32#0: *6881 [lua] handler.lua:218: [events] could not broadcast crud event: nil, client: 10.x.x.x, server: kong, request: "POST /auth/oauth2/token HTTP/1.1", host: "gateway.company.com"
2018/08/10 18:33:22 [error] 37#0: *27916 [lua] handler.lua:218: [events] could not broadcast crud event: nil, client: 10.x.x.x, server: kong, request: "POST /auth/oauth2/token HTTP/1.1", host: "gateway.company.com"
2018/08/10 18:33:26 [error] 37#0: *31278 [lua] handler.lua:218: [events] could not broadcast crud event: nil, client: 10.x.x.x, server: kong, request: "POST /auth/oauth2/token HTTP/1.1", host: "gateway.company.com"

But my HTTP logger is not reporting anything other than 200’s on token responses. The above logs make me think maybe those specific tx would have thrown the generic 500 and failed to possibly propagate the token? I see the section of code that failure occurred of deals with this and is about routes specifically? Should I be concerned?

        -- new DB module and old DAO: public worker events propagation

        local entity_channel           = data.schema.table or data.schema.name
        local entity_operation_channel = fmt("%s:%s", data.schema.table,
                                             data.operation)

        -- crud:routes
        local ok, err = worker_events.post_local("crud", entity_channel, data)
        if not ok then
          log(ngx.ERR, "[events] could not broadcast crud event: ", err)
          return
end

Can speak that the RF of 3 and LOCAL_Q is showing Kong can sustain the 1 node down in each DC as predicted/expected. That certainly helps!

Thanks for checking back in.

Those logs should be harmless; they are false negatives. That’s actually a nice catch! I opened a PR to get rid of them here:

There should be no real impact on your workload or tokens generation and such, as you observed. Keep letting us know how this new setup goes!

Cheers,

1 Like

Happy to hear its harmless, and great to see another bug squashed!

Thanks again.