We have been trying to evaluate kong-ingress for some time now in our environment, but are facing some performance issues. We are using AWS EKS for hosting our environment and currently our request flow is like:
User -> AWS ALB -> nginx ingress -> kong gateway -> apps
We simply are looking to replace
nginx ingress + kong gateway with
kong ingress with dbless mode so that we can use declarative approach for our kong configuration.
Now the problem is we are frequently getting Liveness and Readiness probes failing for both the containers (proxy and ingress-controller) inside kong-ingress pod similar to below
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Unhealthy 4m34s (x82 over 3d5h) kubelet, ip-100-64-22-175.eu-central-1.compute.internal Liveness probe failed: Get http://100.64.30.47:9001/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers) Warning Unhealthy 4m30s (x29 over 3d5h) kubelet, ip-100-64-22-175.eu-central-1.compute.internal Readiness probe failed: Get http://100.64.30.47:9001/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
and the containers are killed and recreated.
We thought it may be due to load and spawned replicas to 10 but alas we start getting the issue during repetitive functional testing only without any load.
Secondly, we tried to split the containers (I read that in some of the post here in the discussions only) and run proxy as DaemonSet and ingress-controller as a quorum of 5 replicas but to our surprise our calls were succeeding sometimes and failing rest of the times because now the kong configuration was getting messed-up and proxy configuration (routes and services) was of sync among the proxies.
Also, when we see the metrics exported through prometheus, the ones under
Caching we are not able to understand them much especially the kong_process_events, what does this metric signify because when the containers freshly start it is like 5% and as soon as we run a test it goes to 100% and stays that way (we are not limiting/requesting the pod resources in any way).
So our queries are:
- Why are the Liveness and Readiness probes failing frequently?
- Why is the proxy containers’ configuration different after splitting the containers and running proxy as DaemonSet?
- What does kong_process_events signify and why it becomes 100% after a single test run and how to tackle this if it’s a problem?
We are using below versions (deployed using https://github.com/Kong/kubernetes-ingress-controller/blob/master/deploy/single/all-in-one-dbless.yaml):
Please do let me know if any other information is required to understand the problem.