Performance and stability results evaluation

HI All,

We have evaluated a few ingress controllers and Kong (opensource) is one of them.
Goal is to use ingress controller for exposing REST API microservices and nodejs web portals.

Setup is the following:

  1. Test application at VM -> Azure Load Balancer (standard, internal) -> AKS ingress controller = Kong -> the same AKS with backend application exposed as service ClusterIP.

  2. AKS nodes version 1.16.15 (Ubuntu 16) are Standard D8s v3 (8 vcpus, 32 GiB memory) and no other apps are intensively deployed on cluster during tests.

  3. Kong is deployed by helm chart, the following parameters are altered from default values (use dbless, deployment mode, no limits for CPU/Memory, autoscaling off - measure one pod performance). Kong helm chart version: 1.12.0

Values:

proxy:
  # Enable creating a Kubernetes service for the proxy
  enabled: true
  type: LoadBalancer
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-internal: "true" # just to expose over Azure LB
ingressController:
  livenessProbe:
    httpGet:
      path: "/healthz"
      port: 10254
      scheme: HTTP
    initialDelaySeconds: 5
    timeoutSeconds: 5
    periodSeconds: 30
    successThreshold: 1
    failureThreshold: 5
env:
  nginx_proxy_keepalive_requests: 1000

Results

Register backed-service at ingressClass=kong for http://FQDN (not https). Backend service is really simple, just respond with 100 bytes reply containing JSON with environment values used in start backend pod (GET method).

1000 connection challenge

The test would try to iterate 100 of scenario: try 1000 conn/s to create, each connection to send 1 message.

Goal is to see if connections are handled correctly without errors by ingress controllers, CPU/Memory utilization by ingress.
For example, connection may be kept in TIME_WAIT state and blocks to accept new connections.
It is kind of stability test for ingress under huge connection load.

Testing is rather old tools:

iterate 100 times:
httperf --server <FQDN> --uri /env --hog --num-conn 1000 --num-cal 1 --rate 1000 --timeout 1

Results:

average: 905.202 conn/s over 100 iterations
193 client-timo errors for total 100 iterations
193 socket-timo for total 100 iterations
1 test drop conn/s rate below 100 (some below 500)
usage VCPU: 0.16, Memory: 380MB
duration of test: 5min

Checked testing VM and it allows to generate bigger connection rate.
I have not found any indication in kong pod logs about problem.
It gives me impression that 1 kong pod as ingress controller is able to support max around 900 conn/s rate.

100K messages challenge

The test would open 1 connection and send 100K messages over it. Measure error rate and duration.

httperf --server <FQDN> --uri /env --hog --num-conn 1 --num-cal 100000 --rate 1 --timeout 5

Here I’ve found limit of only 1000 messages accepted by test and the rest declined. It is controlled by helm parameter:

env:
  nginx_proxy_keepalive_requests: 1000

Why hard limit is defined for nginx/kong? I have not found this limit for other ingress controllers providers (except nginx). What is production grade value to use?

Siege test

Test:

siege -b --time=20M --concurrent=20 --log=$PWD/siege-kong.log <FQDN>/env

Result:

Transactions:            2482082 hits
Availability:             100.00 %
Elapsed time:            1199.14 secs
Data transferred:        2383.67 MB
Response time:              0.01 secs
Transaction rate:        2069.89 trans/sec
Throughput:             1.99 MB/sec
Concurrency:               18.06
Successful transactions:     2482082
Failed transactions:              72
Longest transaction:            7.11
Shortest transaction:           0.00
Usage VCPU: 0.83, Memory:  377MB

Other ingresses in the same circumstances got 1 or 0 failed transactions and Longest transation was 1.04s.

Summary

My goal is not to debug issue why transations have failed during my tests, used rather old tools to check capabilities.

I would like to understand one pod capacity limits, bottle-necks found already and production grade config to be applied on Kong.

Capacity limits expected:

  • # of connections/s supported by one pod (my findings 900 conn/s)
  • max VCPU/Memory required by one pod under heavy load (VCPU:1, memory: 500MB)
  • max throughput (my finding 7MB/s in/out)
  • any other limits

Really appreaciate support from Kong community.
Regards, Slawek

Using nginx_proxy_keepalive_requests: 1000 sets that NGINX-level tunable. Per their docs, the limit exists as a performance safeguard to use RAM effectively, but it’s not a hard limit on requests, it’s a limit on the number of times a single kept-alive connection can be reused.

That should be handled gracefully by most clients, since they’ll see the connection closure and open a new one the next time they need to send a request, but it looks like --num-conn 1 for httperf won’t reopen the connection if it’s closed.

I’m unsure of the effects on memory consumption as you increase that setting indefinitely–you’d need to try and observe what that looks like in practice–but what value is appropriate depends on a variety of factors around what type of performance you’re trying to optimize for. Again, closing a kept-alive connection is generally fine since it’s usually handled gracefully by the client, though opening a new connection adds some additional latency to perform a new TCP handshake; you’d probably observe that in practice as contributing to a increase in higher-percentile latencies (i.e. most requests will use a kept-alive connection, but a small fraction will need to perform a handshake again).

Thank you for reply!

Do you know any other Kong as ingress controller capacity limits?

Thanks, Slawek

We don’t have detailed capacity data that I’m aware of–generally our recommendation is to do more or less what you’ve done here, i.e. configure the system as you intend to use it and run some means of synthetic testing to analyze its overall performance.