We have been using kong on our production clusters for a while. When we only had 20% of our applications we had no problems.
We recently migrated the rest of our workload and since then we’ve had an error on part of our kong replica:
failed recreating balancer for app.namespace.80.svc: timeout waiting for balancer for a9a7385e-ad84-474e-9b1c-2....
To “solve” this problem, we are forced to restart the pod that returns this error.
Unfortunately if we don’t manage to detect the problem quickly enough, all the pods end up having the error and we have 100% of our traffic interrupted.
We did not notice any excessive RAM consumption on the pods and no other errors were returned by the pod before the error below.