We have just one service and one route configured on kong in our load-test environment.
API latency is generally 4ms and we are able to achieve 1Million RPM on 8cores and 16GB memory (C5.2xlarge ec2 AWS). At this RPM, the CPU reaches ~75%.
On further increasing the load by 0.2M, we start seeing requests failing with 502 status code. Average latency increases from 4ms to 300ms. These requests with status code 502 don’t even reach the upstream server. I believe this is happening because of “backlog” setting in “kong.conf” file which is closing the connection when requests overflow the queue. But why does the latency increase?
Out of 1.2M requests, we start getting somewhere around 400K requests with status code 502 and 800K requests with status code 200. This indicates these 502 requests are deteriorating the performance of 2XX requests also as load increases.
Ideal behavior we want to achieve: Kong server should take as many requests as it can and serve them with 2XX status code. Any extra request which is too much to handle (CPU load breach, backlog queue breach, etc) should get dropped immediately without causing a rise in latency.
Configuration we are using:
Kong instances behind an AWS ALB
limitnginx_main_worker_rlimit_nofile = auto
nginx_events_worker_connections = 2000
ulimit -n 100000
proxy_listen = 0.0.0.0:8000 reuseport backlog=16384
We can see in AWS ALB logs status code 502 is returned and target-time is around 400ms.
We have tried playing with various tuning parameters in nginx.conf but haven’t been able to get better performance. Can someone please guide us in the right direction?
I’ll add any more details if required.