Troubleshooting a ring-balancer failure

I’m trying to troubleshoot a “failure to get a peer from the ring-balancer” response. We’re using v1.1.0 of kong’s helm chart and kong 2.2, on an AKS cluster running 1.18.8.

The application is pretty simple — an ingress, a service, a deployment, six pods, round-robin load balancer. We have two kong ingress controllers, for dev and UAT tiers, and the same service is deployed into both tiers. In dev, we get the “failure to get a peer” error, but the UAT version of the same helm chart responds normally. We also have a second service working correctly behind both the dev and UAT ingress controllers, so the problem is specific to one service in one tier.

I’m seeing nothing untoward in the kong error logs at startup for the service or the pods, other than the 503 errors accompanying failed attempts to reach this service. Querying the pods or the service from inside the cluster returns the expected 200 response.

I fetched the details of the upstream (named “our-service-name.our-namespace-name.80.svc”) and its targets, which shows me six healthy-looking entries like this:

	"created_at": 1611556277.641, 
	"id": "e6fff0ef-8c2a-52ea-ac1d-34b439e8e265", 
	"tags": null, 
	"target": "", 
	"upstream": {
		"id": "95ba89c5-106c-5058-963c-34aae2627dcf"
	"weight": 100

All six target IPs match the current pod addresses and ports. When I exec into the kong proxy pod and wget the health check endpoints, these both return 200:

wget -S -O - 
wget -S -O - our-service-name.our-service-namespace.svc.cluster.local/health/ping

This seems like the definition of a healthy set of targets. What else should I be looking at to try to understand this ring-balancer error?

Thanks for your time.

What do you see in the proxy container’s logs with the Kong log level set to debug? Recent versions should have fairly detailed information about target resolution.

With the log level set to debug, I’m not seeing any logging about target resolution, but it did add some suspicious message about closed connections. Here are the four lines mentioning the IP address I was testing from:

kong-kong-846dc85d57-mc8wz proxy 2021/01/29 20:52:40 [info] 21#0: *2397 client closed connection while SSL handshaking, client:, server:
kong-kong-846dc85d57-mc8wz proxy - - [29/Jan/2021:20:52:42 +0000] "GET /health/ping HTTP/1.1" 503 58 "-" "-"
kong-kong-846dc85d57-mc8wz proxy 2021/01/29 20:52:42 [info] 21#0: *2563 client closed keepalive connection
kong-kong-846dc85d57-jb2xg proxy 2021/01/29 20:52:43 [info] 22#0: *2888 client closed connection while waiting for request, client:, server:

Unfortunately, I also see those “client closed connection” logs when I ask for debug logs in our UAT-tier kong deployment, which is not displaying any ingress misbehavior.

What does the admin API output for /upstreams/our-service-name.our-namespace-name.80.svc/health look like on the two instances? Do you have healthchecks configured?

There aren’t any by default, but the mention of the health endpoints suggests maybe you added some–those should definitely report something in logs though.

Sorry, the health check endpoints that I was referring to are just the ones baked into our service, which return 200/OK. I haven’t configured healthchecks in kong, and didn’t even know about them until I started going down this ring-balancer rabbit hole. I can pursue that, though.

That health endpoint reports information whether or not healthchecks are enabled. Individual targets can have status HEALTHY, UNHEALTHY, or HEALTHCHECKS_OFF. Every target should have status HEALTHCHECKS_OFF if you haven’t configured a healthcheck.

You should only get those 503s if there are either no targets at all (because no there are no ready Pods providing that Service) or if everything has status UNHEALTHY.

If that’s not actually the case, you’ll want to file a Kong bug. The balancer should be able to find a peer if it’s reporting at least one target with status HEALTHY or HEALTHCHECKS_OFF.