Possible memory leak - Kong 1.4 & KIC 0.6.1

  • We are running Kong and the ingress controller in Openshift 3.9.
  • We are running in DB-less mode
  • Even with kong running with 4GB of memory it doesn’t seem to stop increasing the memory usage.
  • We have tried disabling all plugins, it’s still increasing it’s memory usage over time.
  • Load seems to increase the rate of memory usage increase

Below shows a graph of memory usage.

Shared dict sizes.

Lua VMs memory

These graphs were generated under almost no load at all. During tests with ~400 requests/s the memory increase rate is higher.

ENV from proxy:
KONG_DATABASE: off
KONG_NGINX_HTTP_CLIENT_HEADER_TIMEOUT: 5s
KONG_NGINX_HTTP_CLIENT_BODY_TIMEOUT: 5s
KONG_NGINX_HTTP_SEND_TIMEOUT: 1m
KONG_NGINX_WORKER_PROCESSES: 1
KONG_ROUTER_CONSISTENCY: eventual
KONG_MEM_CACHE_SIZE: 256m
KONG_NGINX_HTTP_INCLUDE: /kong/servers.conf
KONG_ADMIN_ACCESS_LOG: /dev/stdout
KONG_ADMIN_ERROR_LOG: /dev/stderr
KONG_PROXY_ACCESS_LOG: /dev/stdout
KONG_PROXY_ERROR_LOG: /dev/stderr
KONG_ADMIN_LISTEN: 127.0.0.1:8444 ssl
KONG_LOG_LEVEL: warn
KONG_PLUGINS: prometheus

Any tips on what the problem could be or how to further troubleshoot this?

We have noticed the same.

We had another issue that was resolved by upgrading to kong 1.4 and KIC 0.6.1 but we are still seeing the constant increase of memory usage.

After further investigation this could be an expected behavior?

I’ve decreased the number of nginx workers to 1(to decrease memory usage)
Whenever the proxy now reaches ~99% of memory usage it seems to trigger a large garbage collection that impacts incoming requests.

I lose about 500ms of traffic(~40 requests) during the GC. It would be nice if the GC triggered earlier and in a less severe manner. Is it something we can configure I wonder?

I’m currently running the proxy with 1GB of memory, maybe that is not a realistic limit.

I’ve done tests now with 4GB or memory for the proxy, the result is still the same. It triggers a large GC when the memory reaches 100% and a few requests are lost during the GC.

I have done some further investigation and can see this in the log while I do some stress testing:

19-11-07 11:29:13.599	 - 	2019/11/07 11:29:13 [notice] 1#0: start worker process 15682	proxy
19-11-07 11:29:13.596	 - 	2019/11/07 11:29:13 [alert] 1#0: worker process 3572 exited on signal 9	proxy
19-11-07 11:29:13.594	 - 	2019/11/07 11:29:13 [notice] 1#0: signal 17 (SIGCHLD) received from 3572	proxy
19-11-07 11:29:13.583	 - 	E1107 11:29:13.582602       1 controller.go:132] unexpected failure updating Kong configuration: 	ingress-controller
19-11-07 11:29:13.583	 - 	posting new config to /config: making HTTP reqeust: Post https://localhost:8444/config?check_hash=1: EOF	ingress-controller

From this time and until the next iteration for the ingress-controller to sync I get 404 for every request to an endpoind behind the api gateway. So this indicates that the dbless config is somehow wiped from the memory when the worker process is killed

Thank you for the reports @goober and @niklasye!

We have taken a note of these and the team is looking into the memory leak issue here.
We will post an update as soon as we have one.

3 Likes

Hi! It looks like there might be a solution in master now, right? https://github.com/Kong/kong/issues/5203

Would you know when we can expect a new release including this fix? :slight_smile:

There are a few fixes that we want to get in but expect a patch release soon, some time around next week.