Small % of transactions have massive latency spike

jeremyjpj0916 · June 15, 2018, 11:59pm

I decided to make a separate topic here to do some diving into what I can see going on with Kong on about what looks like between 1-4% of overall transactions (even ones that stop at localhost and don’t proxy to the back-end which makes me think its NOT DNS related but rather somewhere in Kong/OpenResty/Nginx land).

Lets walk through my rational here, first let me explain the proxy in question. We have a proxy endpoint called “/F5/status”, this proxy runs the termination plugin and simply returns { “message”: “success” }, basically an up-time ping endpoint many of you may be familiar with implementing. A breakdown of transaction times is here for how fast that is. As you can imagine most calls are blazing fast, but the 1-3% that are not and hanging a long time internally are the ones I am really concerned with(as I notice in other proxy services that actually route similar behavior):

So as you can see, majority of tx are under 3ms. But lets take a look at some of the outliers:

In a 6 hour period for this service (that gets called every 5 seconds) about 9 transactions had massive strange latency spikes.

Now this is as detailed as Dynatrace gets on a low view sadly:

Which isn’t pretty much everything Kong/OpenResty tied to this module?:

github.com

openresty/lua-nginx-module/blob/master/src/ngx_http_lua_module.c


/*
 * Copyright (C) Xiaozhe Wang (chaoslawful)
 * Copyright (C) Yichun Zhang (agentzh)
 */


#ifndef DDEBUG
#define DDEBUG 0
#endif
#include "ddebug.h"


#include "ngx_http_lua_directive.h"
#include "ngx_http_lua_capturefilter.h"
#include "ngx_http_lua_contentby.h"
#include "ngx_http_lua_server_rewriteby.h"
#include "ngx_http_lua_rewriteby.h"
#include "ngx_http_lua_accessby.h"
#include "ngx_http_lua_logby.h"

This file has been truncated. show original

Any ideas what could be causing such an internal delay over localhost on a low % of transactions, I am using 1% or less of my availible CPU/RAM and overall most of the time it returns in low ms. Kinda grasping at straws here as its not easily debuggable.

Other relevant details-
My DB Cache TTL is 3 hours(maybe turning the ttl off would help?). But I feel like IF it was because there was a DB call required (2 Kong hosts in this cluster we are looking at) I would see 2 slow transactions every 3 hours like clockwork but that’s not really the case, its sort of sporadic and unpredictable across the array of proxy services I have.

Other Global Plugins:
Kong HTTP Logger Variant(GitHub - Optum/kong-splunk-log: Kong plugin designed to log API transactions to Splunk.)
Kong StatsD Logger
Kong Rate Limiting Plugin
Kong Upstream JWT(GitHub - Optum/kong-upstream-jwt: A plugin for Kong which adds a signed JWT to HTTP Headers to backend requests)

Other relevant posts of my research too thus far:

jeremyjpj0916 · July 12, 2018, 3:45am

Seems going DB Cache TTL to 0 + new C* lb policy + Kong 0.14.0 upgrade in general have fixed whatever was causing between 1-5% of latency spikes. Could have been the DB but the spikes didn’t seem to have much of a pattern like I would think for a caching problem. Can say now from my investigations only 2 transactions out of 4,221 transactions during the day reported gateway latency of like 1+ second so I feel pretty great about those results! My ping endpoint is reporting no latency spikes now during the day and I used to see 20+ of those occurring so this is all very encouraging, hope others benefit from the move to Kong 0.14.0 too(until it becomes old because new super cool features again in 1-2 months ) !

Topic		Replies	Views
Latency spikes under nominal load kong-gateway	7	2474	June 16, 2018
X-Kong-Proxy-Latency sometimes high Questions	2	6910	March 22, 2018
High kong-proxy-latency values, what do they include? kong-gateway	1	1057	December 13, 2022
How to understand latencies.kong? Questions	4	3158	January 12, 2018
High x-kong-proxy-latency Questions	1	932	February 4, 2019

Small % of transactions have massive latency spike

Related topics