Small % of transactions have massive latency spike

I decided to make a separate topic here to do some diving into what I can see going on with Kong on about what looks like between 1-4% of overall transactions (even ones that stop at localhost and don’t proxy to the back-end which makes me think its NOT DNS related but rather somewhere in Kong/OpenResty/Nginx land).

Lets walk through my rational here, first let me explain the proxy in question. We have a proxy endpoint called “/F5/status”, this proxy runs the termination plugin and simply returns { “message”: “success” }, basically an up-time ping endpoint many of you may be familiar with implementing. A breakdown of transaction times is here for how fast that is. As you can imagine most calls are blazing fast, but the 1-3% that are not and hanging a long time internally are the ones I am really concerned with(as I notice in other proxy services that actually route similar behavior):

So as you can see, majority of tx are under 3ms. But lets take a look at some of the outliers:

In a 6 hour period for this service (that gets called every 5 seconds) about 9 transactions had massive strange latency spikes.

Now this is as detailed as Dynatrace gets on a low view sadly:

Which isn’t pretty much everything Kong/OpenResty tied to this module?:

Any ideas what could be causing such an internal delay over localhost on a low % of transactions, I am using 1% or less of my availible CPU/RAM and overall most of the time it returns in low ms. Kinda grasping at straws here as its not easily debuggable.

Other relevant details-
My DB Cache TTL is 3 hours(maybe turning the ttl off would help?). But I feel like IF it was because there was a DB call required (2 Kong hosts in this cluster we are looking at) I would see 2 slow transactions every 3 hours like clockwork but that’s not really the case, its sort of sporadic and unpredictable across the array of proxy services I have.

Other Global Plugins:
Kong HTTP Logger Variant(GitHub - Optum/kong-splunk-log: Kong plugin designed to log API transactions to Splunk.)
Kong StatsD Logger
Kong Rate Limiting Plugin
Kong Upstream JWT(GitHub - Optum/kong-upstream-jwt: A plugin for Kong which adds a signed JWT to HTTP Headers to backend requests)

Other relevant posts of my research too thus far:

Seems going DB Cache TTL to 0 + new C* lb policy + Kong 0.14.0 upgrade in general have fixed whatever was causing between 1-5% of latency spikes. Could have been the DB but the spikes didn’t seem to have much of a pattern like I would think for a caching problem. Can say now from my investigations only 2 transactions out of 4,221 transactions during the day reported gateway latency of like 1+ second so I feel pretty great about those results! My ping endpoint is reporting no latency spikes now during the day and I used to see 20+ of those occurring so this is all very encouraging, hope others benefit from the move to Kong 0.14.0 too(until it becomes old because new super cool features again in 1-2 months :laughing: ) !

1 Like