We have evaluated a few ingress controllers and Kong (opensource) is one of them.
Goal is to use ingress controller for exposing REST API microservices and nodejs web portals.
Setup is the following:
Test application at VM -> Azure Load Balancer (standard, internal) -> AKS ingress controller = Kong -> the same AKS with backend application exposed as service ClusterIP.
AKS nodes version 1.16.15 (Ubuntu 16) are Standard D8s v3 (8 vcpus, 32 GiB memory) and no other apps are intensively deployed on cluster during tests.
Kong is deployed by helm chart, the following parameters are altered from default values (use dbless, deployment mode, no limits for CPU/Memory, autoscaling off - measure one pod performance). Kong helm chart version: 1.12.0
proxy: # Enable creating a Kubernetes service for the proxy enabled: true type: LoadBalancer annotations: service.beta.kubernetes.io/azure-load-balancer-internal: "true" # just to expose over Azure LB ingressController: livenessProbe: httpGet: path: "/healthz" port: 10254 scheme: HTTP initialDelaySeconds: 5 timeoutSeconds: 5 periodSeconds: 30 successThreshold: 1 failureThreshold: 5 env: nginx_proxy_keepalive_requests: 1000
Register backed-service at ingressClass=kong for http://FQDN (not https). Backend service is really simple, just respond with 100 bytes reply containing JSON with environment values used in start backend pod (GET method).
1000 connection challenge
The test would try to iterate 100 of scenario: try 1000 conn/s to create, each connection to send 1 message.
Goal is to see if connections are handled correctly without errors by ingress controllers, CPU/Memory utilization by ingress.
For example, connection may be kept in TIME_WAIT state and blocks to accept new connections.
It is kind of stability test for ingress under huge connection load.
Testing is rather old tools:
iterate 100 times: httperf --server <FQDN> --uri /env --hog --num-conn 1000 --num-cal 1 --rate 1000 --timeout 1
average: 905.202 conn/s over 100 iterations 193 client-timo errors for total 100 iterations 193 socket-timo for total 100 iterations 1 test drop conn/s rate below 100 (some below 500) usage VCPU: 0.16, Memory: 380MB duration of test: 5min
Checked testing VM and it allows to generate bigger connection rate.
I have not found any indication in kong pod logs about problem.
It gives me impression that 1 kong pod as ingress controller is able to support max around 900 conn/s rate.
100K messages challenge
The test would open 1 connection and send 100K messages over it. Measure error rate and duration.
httperf --server <FQDN> --uri /env --hog --num-conn 1 --num-cal 100000 --rate 1 --timeout 5
Here I’ve found limit of only 1000 messages accepted by test and the rest declined. It is controlled by helm parameter:
env: nginx_proxy_keepalive_requests: 1000
Why hard limit is defined for nginx/kong? I have not found this limit for other ingress controllers providers (except nginx). What is production grade value to use?
siege -b --time=20M --concurrent=20 --log=$PWD/siege-kong.log <FQDN>/env
Transactions: 2482082 hits Availability: 100.00 % Elapsed time: 1199.14 secs Data transferred: 2383.67 MB Response time: 0.01 secs Transaction rate: 2069.89 trans/sec Throughput: 1.99 MB/sec Concurrency: 18.06 Successful transactions: 2482082 Failed transactions: 72 Longest transaction: 7.11 Shortest transaction: 0.00 Usage VCPU: 0.83, Memory: 377MB
Other ingresses in the same circumstances got 1 or 0 failed transactions and Longest transation was 1.04s.
My goal is not to debug issue why transations have failed during my tests, used rather old tools to check capabilities.
I would like to understand one pod capacity limits, bottle-necks found already and production grade config to be applied on Kong.
Capacity limits expected:
- # of connections/s supported by one pod (my findings 900 conn/s)
- max VCPU/Memory required by one pod under heavy load (VCPU:1, memory: 500MB)
- max throughput (my finding 7MB/s in/out)
- any other limits
Really appreaciate support from Kong community.