Possible memory leak - Kong 1.4 & KIC 0.6.1

niklasye · November 4, 2019, 8:00am

We are running Kong and the ingress controller in Openshift 3.9.
We are running in DB-less mode
Even with kong running with 4GB of memory it doesn’t seem to stop increasing the memory usage.
We have tried disabling all plugins, it’s still increasing it’s memory usage over time.
Load seems to increase the rate of memory usage increase

Below shows a graph of memory usage.

Yellow line=Proxy
Green line=Ingress controller

Screenshot from 2019-11-04 08-48-17.png1829×282 35.2 KB

Shared dict sizes.

Lua VMs memory

These graphs were generated under almost no load at all. During tests with ~400 requests/s the memory increase rate is higher.

ENV from proxy:
KONG_DATABASE: off
KONG_NGINX_HTTP_CLIENT_HEADER_TIMEOUT: 5s
KONG_NGINX_HTTP_CLIENT_BODY_TIMEOUT: 5s
KONG_NGINX_HTTP_SEND_TIMEOUT: 1m
KONG_NGINX_WORKER_PROCESSES: 1
KONG_ROUTER_CONSISTENCY: eventual
KONG_MEM_CACHE_SIZE: 256m
KONG_NGINX_HTTP_INCLUDE: /kong/servers.conf
KONG_ADMIN_ACCESS_LOG: /dev/stdout
KONG_ADMIN_ERROR_LOG: /dev/stderr
KONG_PROXY_ACCESS_LOG: /dev/stdout
KONG_PROXY_ERROR_LOG: /dev/stderr
KONG_ADMIN_LISTEN: 127.0.0.1:8444 ssl
KONG_LOG_LEVEL: warn
KONG_PLUGINS: prometheus

Any tips on what the problem could be or how to further troubleshoot this?

goober · November 6, 2019, 12:12pm

We have noticed the same.

We had another issue that was resolved by upgrading to kong 1.4 and KIC 0.6.1 but we are still seeing the constant increase of memory usage.

github.com/Kong/kong

Could not write to shm after 6 tries (no memory), it is either fragmented or cannot allocate more memory

opened 11:08PM - 04 Nov 19 UTC

closed 02:23PM - 25 Nov 19 UTC

goober

task/bug

### Summary We have recently started to recognize error events in our kong-pr…oxy (see logs below). First occurrence happens a couple of minutes after startup. ### Additional Details & Logs * *Manifests:* [kong-ingress-controller dbless](https://github.com/Kong/kubernetes-ingress-controller/tree/0.6.0/deploy/manifests) * *Kong version:* 1.3 * *kong-ingress-controller:* 0.6.0 * Openshift 3.10 ``` 2019/11/04 22:24:21 [notice] 39#0: *50122 [lua] cache.lua:321: purge(): [DB cache] purging (local) cache, client: 127.0.0.1, server: kong_admin, request: "POST /config?check_hash=1 HTTP/1.1", host: "localhost:8444" -- | 2019/11/04 22:24:21 [warn] 39#0: *50122 [lua] events.lua:155: post_event(): worker-events: could not write to shm after 6 tries (no memory), it is either fragmented or cannot allocate more memory, consider increasing 'opts.shm_retries' or increasing the shm size, client: 127.0.0.1, server: kong_admin, request: "POST /config?check_hash=1 HTTP/1.1", host: "localhost:8444" | 2019/11/04 22:24:21 [error] 39#0: *50122 [lua] events.lua:273: post(): worker-events: failed posting event "upstreams" by "crud"; no memory, client: 127.0.0.1, server: kong_admin, request: "POST /config?check_hash=1 HTTP/1.1", host: "localhost:8444" | 2019/11/04 22:24:21 [error] 39#0: *50122 [kong] init.lua:331 failed posting crud event for upstreams nil, client: 127.0.0.1, server: kong_admin, request: "POST /config?check_hash=1 HTTP/1.1", host: "localhost:8444" | 2019/11/04 22:24:21 [warn] 39#0: *50122 [lua] events.lua:155: post_event(): worker-events: could not write to shm after 6 tries (no memory), it is either fragmented or cannot allocate more memory, consider increasing 'opts.shm_retries' or increasing the shm size, client: 127.0.0.1, server: kong_admin, request: "POST /config?check_hash=1 HTTP/1.1", host: "localhost:8444" | 2019/11/04 22:24:21 [error] 39#0: *50122 [lua] events.lua:273: post(): worker-events: failed posting event "upstreams" by "crud"; no memory, client: 127.0.0.1, server: kong_admin, request: "POST /config?check_hash=1 HTTP/1.1", host: "localhost:8444" | 2019/11/04 22:24:21 [error] 39#0: *50122 [kong] init.lua:331 failed posting crud event for upstreams nil, client: 127.0.0.1, server: kong_admin, request: "POST /config?check_hash=1 HTTP/1.1", host: "localhost:8444" | 2019/11/04 22:24:21 [error] 39#0: *50122 [lua] events.lua:364: post(): worker-events: dropping event; waiting for event data timed out, id: 103004, client: 127.0.0.1, server: kong_admin, request: "POST /config?check_hash=1 HTTP/1.1", host: "localhost:8444" ``` Output from `curl https://localhost:8444/status` ``` { "database": { "reachable": true }, "memory": { "workers_lua_vms": [{ "http_allocated_gc": "0.03 MiB", "pid": 32 }, { "http_allocated_gc": "0.03 MiB", "pid": 33 }, { "http_allocated_gc": "0.07 MiB", "pid": 34 }, { "http_allocated_gc": "0.03 MiB", "pid": 35 }, { "http_allocated_gc": "0.03 MiB", "pid": 36 }, { "http_allocated_gc": "0.03 MiB", "pid": 37 }, { "http_allocated_gc": "0.03 MiB", "pid": 38 }, { "http_allocated_gc": "0.03 MiB", "pid": 39 }], "lua_shared_dicts": { "kong_locks": { "allocated_slabs": "0.06 MiB", "capacity": "8.00 MiB" }, "kong_db_cache_2": { "allocated_slabs": "2.12 MiB", "capacity": "128.00 MiB" }, "kong": { "allocated_slabs": "0.04 MiB", "capacity": "5.00 MiB" }, "kong_db_cache_miss_2": { "allocated_slabs": "0.09 MiB", "capacity": "12.00 MiB" }, "kong_db_cache": { "allocated_slabs": "0.76 MiB", "capacity": "128.00 MiB" }, "kong_process_events": { "allocated_slabs": "5.00 MiB", "capacity": "5.00 MiB" }, "kong_db_cache_miss": { "allocated_slabs": "0.08 MiB", "capacity": "12.00 MiB" }, "kong_cluster_events": { "allocated_slabs": "0.04 MiB", "capacity": "5.00 MiB" }, "prometheus_metrics": { "allocated_slabs": "0.04 MiB", "capacity": "5.00 MiB" }, "kong_healthchecks": { "allocated_slabs": "0.08 MiB", "capacity": "5.00 MiB" }, "kong_rate_limiting_counters": { "allocated_slabs": "0.08 MiB", "capacity": "12.00 MiB" } } }, "server": { "connections_writing": 1, "total_requests": 108, "connections_handled": 108, "connections_accepted": 108, "connections_reading": 0, "connections_active": 1, "connections_waiting": 0 } } ``` Kong configuration: ``` { "plugins": { "enabled_in_cluster": ["prometheus", "zipkin", "request-transformer", "post-function"], "available_on_server": { "correlation-id": true, "pre-function": true, "cors": true, "ldap-auth": true, "loggly": true, "hmac-auth": true, "zipkin": true, "request-size-limiting": true, "azure-functions": true, "request-transformer": true, "oauth2": true, "response-transformer": true, "ip-restriction": true, "statsd": true, "jwt": true, "proxy-cache": true, "basic-auth": true, "key-auth": true, "http-log": true, "datadog": true, "tcp-log": true, "rate-limiting": true, "post-function": true, "prometheus": true, "acl": true, "kubernetes-sidecar-injector": true, "syslog": true, "file-log": true, "udp-log": true, "response-ratelimiting": true, "aws-lambda": true, "session": true, "bot-detection": true, "request-termination": true } }, "tagline": "Welcome to kong", "configuration": { "plugins": ["bundled"], "admin_ssl_enabled": true, "lua_ssl_verify_depth": 1, "trusted_ips": {}, "prefix": "\/usr\/local\/kong", "loaded_plugins": { "session": true, "pre-function": true, "cors": true, "ldap-auth": true, "loggly": true, "hmac-auth": true, "zipkin": true, "request-size-limiting": true, "azure-functions": true, "request-transformer": true, "oauth2": true, "response-transformer": true, "syslog": true, "statsd": true, "jwt": true, "proxy-cache": true, "basic-auth": true, "key-auth": true, "http-log": true, "datadog": true, "tcp-log": true, "correlation-id": true, "post-function": true, "bot-detection": true, "acl": true, "kubernetes-sidecar-injector": true, "ip-restriction": true, "file-log": true, "udp-log": true, "response-ratelimiting": true, "aws-lambda": true, "rate-limiting": true, "prometheus": true, "request-termination": true }, "cassandra_username": "kong", "ssl_cert_key": "\/usr\/local\/kong\/ssl\/kong-default.key", "admin_ssl_cert_key": "\/usr\/local\/kong\/ssl\/admin-kong-default.key", "dns_resolver": {}, "pg_user": "kong", "mem_cache_size": "128m", "nginx_admin_directives": {}, "nginx_http_upstream_directives": [{ "value": "60s", "name": "keepalive_timeout" }, { "value": "100", "name": "keepalive_requests" }, { "value": "60", "name": "keepalive" }], "nginx_http_directives": [{ "value": "TLSv1.1 TLSv1.2 TLSv1.3", "name": "ssl_protocols" }, { "value": "\/kong\/servers.conf", "name": "include" }, { "value": "prometheus_metrics 5m", "name": "lua_shared_dict" }], "pg_host": "127.0.0.1", "nginx_acc_logs": "\/usr\/local\/kong\/logs\/access.log", "pg_semaphore_timeout": 60000, "proxy_listen": ["0.0.0.0:8000", "0.0.0.0:8443 ssl"], "client_ssl_cert_default": "\/usr\/local\/kong\/ssl\/kong-default.crt", "cassandra_ssl": false, "db_update_frequency": 5, "db_update_propagation": 0, "stream_listen": ["off"], "nginx_err_logs": "\/usr\/local\/kong\/logs\/error.log", "cassandra_port": 9042, "dns_order": ["LAST", "SRV", "A", "CNAME"], "dns_error_ttl": 1, "headers": ["server_tokens", "latency_tokens"], "cassandra_lb_policy": "RequestRoundRobin", "nginx_optimizations": true, "nginx_http_upstream_keepalive_timeout": "60s", "pg_timeout": 5000, "nginx_http_upstream_keepalive_requests": "100", "database": "off", "proxy_access_log": "logs\/access.log", "pg_database": "kong", "nginx_worker_processes": "auto", "client_ssl": false, "lua_package_cpath": "", "ssl_cert_key_default": "\/usr\/local\/kong\/ssl\/kong-default.key", "admin_acc_logs": "\/usr\/local\/kong\/logs\/admin_access.log", "cassandra_contact_points": ["127.0.0.1"], "cassandra_repl_factor": 1, "lua_package_path": ".\/?.lua;.\/?\/init.lua;", "nginx_pid": "\/usr\/local\/kong\/pids\/nginx.pid", "upstream_keepalive": 60, "dns_stale_ttl": 4, "origins": {}, "nginx_kong_stream_conf": "\/usr\/local\/kong\/nginx-kong-stream.conf", "error_default_type": "text\/plain", "admin_access_log": "\/dev\/stdout", "stream_listeners": {}, "nginx_daemon": "off", "proxy_listeners": [{ "listener": "0.0.0.0:8000", "proxy_protocol": false, "reuseport": false, "transparent": false, "ssl": false, "ip": "0.0.0.0", "deferred": false, "http2": false, "port": 8000, "bind": false }, { "listener": "0.0.0.0:8443 ssl", "proxy_protocol": false, "reuseport": false, "transparent": false, "ssl": true, "ip": "0.0.0.0", "deferred": false, "http2": false, "port": 8443, "bind": false }], "proxy_ssl_enabled": true, "nginx_http_upstream_keepalive": "60", "db_cache_warmup_entities": ["services", "plugins"], "lua_socket_pool_size": 30, "nginx_http_ssl_protocols": "TLSv1.1 TLSv1.2 TLSv1.3", "router_consistency": "strict", "db_resurrect_ttl": 30, "nginx_stream_directives": {}, "cassandra_consistency": "ONE", "db_cache_ttl": 0, "admin_error_log": "\/dev\/stderr", "admin_ssl_cert_default": "\/usr\/local\/kong\/ssl\/admin-kong-default.crt", "dns_not_found_ttl": 30, "pg_ssl": false, "nginx_http_include": "\/kong\/servers.conf", "ssl_cipher_suite": "modern", "cassandra_repl_strategy": "SimpleStrategy", "kong_env": "\/usr\/local\/kong\/.kong_env", "cassandra_schema_consensus_timeout": 10000, "pg_max_concurrent_queries": 0, "client_max_body_size": "0", "nginx_kong_conf": "\/usr\/local\/kong\/nginx-kong.conf", "real_ip_header": "X-Forwarded-For", "dns_hostsfile": "\/etc\/hosts", "admin_listeners": [{ "listener": "127.0.0.1:8444 ssl", "proxy_protocol": false, "reuseport": false, "transparent": false, "ssl": true, "ip": "127.0.0.1", "deferred": false, "http2": false, "port": 8444, "bind": false }], "dns_no_sync": false, "ssl_cert": "\/usr\/local\/kong\/ssl\/kong-default.crt", "cassandra_timeout": 5000, "admin_ssl_cert_key_default": "\/usr\/local\/kong\/ssl\/admin-kong-default.key", "cassandra_ssl_verify": false, "cassandra_data_centers": ["dc1:2", "dc2:3"], "log_level": "notice", "real_ip_recursive": "on", "proxy_error_log": "logs\/error.log", "client_ssl_cert_key_default": "\/usr\/local\/kong\/ssl\/kong-default.key", "admin_ssl_cert": "\/usr\/local\/kong\/ssl\/admin-kong-default.crt", "anonymous_reports": true, "nginx_proxy_directives": {}, "nginx_sproxy_directives": {}, "pg_port": 5432, "pg_ssl_verify": false, "client_body_buffer_size": "8k", "ssl_preread_enabled": true, "ssl_cert_csr_default": "\/usr\/local\/kong\/ssl\/kong-default.csr", "nginx_conf": "\/usr\/local\/kong\/nginx.conf", "cassandra_keyspace": "kong", "ssl_cert_default": "\/usr\/local\/kong\/ssl\/kong-default.crt", "enabled_headers": { "latency_tokens": true, "X-Kong-Proxy-Latency": true, "Via": true, "server_tokens": true, "Server": true, "X-Kong-Upstream-Latency": true, "X-Kong-Upstream-Status": false }, "admin_listen": ["127.0.0.1:8444 ssl"] }, "version": "1.3.0", "node_id": "19a09d1a-f414-4f19-9604-f12c2ea2bb0a", "lua_version": "LuaJIT 2.1.0-beta3", "prng_seeds": { "pid: 32": 182465973931, "pid: 35": 911142381632, "pid: 39": 205124189184, "pid: 1": 219190232120, "pid: 34": 120102220132, "pid: 38": 405410720719, "pid: 37": 932922421325, "pid: 36": 541265322358, "pid: 33": 111174441921 }, "timers": { "pending": 144, "running": 0 }, "hostname": "ingress-kong-998974759-gtxvn" } ``` You can also see that the used memory keeps going upwards: ![image](https://user-images.githubusercontent.com/124244/68190878-2e6c1500-ffa6-11e9-8d80-809fb19cf2ce.png)

After further investigation this could be an expected behavior?

niklasye · November 6, 2019, 2:23pm

I’ve decreased the number of nginx workers to 1(to decrease memory usage)
Whenever the proxy now reaches ~99% of memory usage it seems to trigger a large garbage collection that impacts incoming requests.

I lose about 500ms of traffic(~40 requests) during the GC. It would be nice if the GC triggered earlier and in a less severe manner. Is it something we can configure I wonder?

I’m currently running the proxy with 1GB of memory, maybe that is not a realistic limit.

niklasye · November 7, 2019, 8:42am

I’ve done tests now with 4GB or memory for the proxy, the result is still the same. It triggers a large GC when the memory reaches 100% and a few requests are lost during the GC.

goober · November 7, 2019, 11:54am

I have done some further investigation and can see this in the log while I do some stress testing:

19-11-07 11:29:13.599	 - 	2019/11/07 11:29:13 [notice] 1#0: start worker process 15682	proxy
19-11-07 11:29:13.596	 - 	2019/11/07 11:29:13 [alert] 1#0: worker process 3572 exited on signal 9	proxy
19-11-07 11:29:13.594	 - 	2019/11/07 11:29:13 [notice] 1#0: signal 17 (SIGCHLD) received from 3572	proxy
19-11-07 11:29:13.583	 - 	E1107 11:29:13.582602       1 controller.go:132] unexpected failure updating Kong configuration: 	ingress-controller
19-11-07 11:29:13.583	 - 	posting new config to /config: making HTTP reqeust: Post https://localhost:8444/config?check_hash=1: EOF	ingress-controller

From this time and until the next iteration for the ingress-controller to sync I get 404 for every request to an endpoind behind the api gateway. So this indicates that the dbless config is somehow wiped from the memory when the worker process is killed

hbagdi · November 7, 2019, 8:16pm

Thank you for the reports @goober and @niklasye!

We have taken a note of these and the team is looking into the memory leak issue here.
We will post an update as soon as we have one.

niklasye · November 25, 2019, 3:04pm

Hi! It looks like there might be a solution in master now, right? https://github.com/Kong/kong/issues/5203

Would you know when we can expect a new release including this fix?

hbagdi · November 26, 2019, 12:14am

There are a few fixes that we want to get in but expect a patch release soon, some time around next week.

Topic		Replies	Views
Kong memory usage doesn't come down	4	4917	July 9, 2019
Aggresive increase in kong memory usage Questions kubernetes	8	2399	November 20, 2019
Kong ingress high memory usage (V1.14.5) Questions kubernetes	2	930	February 8, 2022
Kong-Ingress memory usage was increased Questions	2	483	February 26, 2020
Constant increase of memory consumption Questions	3	1597	April 24, 2019

Possible memory leak - Kong 1.4 & KIC 0.6.1

Related topics