Troubleshooting a ring-balancer failure

irons · January 25, 2021, 11:47pm

I’m trying to troubleshoot a “failure to get a peer from the ring-balancer” response. We’re using v1.1.0 of kong’s helm chart and kong 2.2, on an AKS cluster running 1.18.8.

The application is pretty simple — an ingress, a service, a deployment, six pods, round-robin load balancer. We have two kong ingress controllers, for dev and UAT tiers, and the same service is deployed into both tiers. In dev, we get the “failure to get a peer” error, but the UAT version of the same helm chart responds normally. We also have a second service working correctly behind both the dev and UAT ingress controllers, so the problem is specific to one service in one tier.

I’m seeing nothing untoward in the kong error logs at startup for the service or the pods, other than the 503 errors accompanying failed attempts to reach this service. Querying the pods or the service from inside the cluster returns the expected 200 response.

I fetched the details of the upstream (named “our-service-name.our-namespace-name.80.svc”) and its targets, which shows me six healthy-looking entries like this:

{
	"created_at": 1611556277.641, 
	"id": "e6fff0ef-8c2a-52ea-ac1d-34b439e8e265", 
	"tags": null, 
	"target": "100.101.50.90:3000", 
	"upstream": {
		"id": "95ba89c5-106c-5058-963c-34aae2627dcf"
	}, 
	"weight": 100
}

All six target IPs match the current pod addresses and ports. When I exec into the kong proxy pod and wget the health check endpoints, these both return 200:

wget -S -O - 100.101.50.90:3000/health/ping 
wget -S -O - our-service-name.our-service-namespace.svc.cluster.local/health/ping

This seems like the definition of a healthy set of targets. What else should I be looking at to try to understand this ring-balancer error?

Thanks for your time.

traines · January 29, 2021, 5:35pm

What do you see in the proxy container’s logs with the Kong log level set to debug? Recent versions should have fairly detailed information about target resolution.

irons · January 30, 2021, 12:25am

With the log level set to debug, I’m not seeing any logging about target resolution, but it did add some suspicious message about closed connections. Here are the four lines mentioning the IP address I was testing from:

kong-kong-846dc85d57-mc8wz proxy 2021/01/29 20:52:40 [info] 21#0: *2397 client closed connection while SSL handshaking, client: 100.101.50.206, server: 0.0.0.0:8443
kong-kong-846dc85d57-mc8wz proxy 100.101.50.206 - - [29/Jan/2021:20:52:42 +0000] "GET /health/ping HTTP/1.1" 503 58 "-" "-"
kong-kong-846dc85d57-mc8wz proxy 2021/01/29 20:52:42 [info] 21#0: *2563 client 100.101.50.206 closed keepalive connection
kong-kong-846dc85d57-jb2xg proxy 2021/01/29 20:52:43 [info] 22#0: *2888 client closed connection while waiting for request, client: 100.101.50.206, server: 0.0.0.0:8000

Unfortunately, I also see those “client closed connection” logs when I ask for debug logs in our UAT-tier kong deployment, which is not displaying any ingress misbehavior.

traines · February 3, 2021, 2:16am

What does the admin API output for /upstreams/our-service-name.our-namespace-name.80.svc/health look like on the two instances? Do you have healthchecks configured?

There aren’t any by default, but the mention of the health endpoints suggests maybe you added some–those should definitely report something in logs though.

irons · February 3, 2021, 7:10pm

Sorry, the health check endpoints that I was referring to are just the ones baked into our service, which return 200/OK. I haven’t configured healthchecks in kong, and didn’t even know about them until I started going down this ring-balancer rabbit hole. I can pursue that, though.

traines · February 3, 2021, 8:01pm

That health endpoint reports information whether or not healthchecks are enabled. Individual targets can have status HEALTHY, UNHEALTHY, or HEALTHCHECKS_OFF. Every target should have status HEALTHCHECKS_OFF if you haven’t configured a healthcheck.

You should only get those 503s if there are either no targets at all (because no there are no ready Pods providing that Service) or if everything has status UNHEALTHY.

If that’s not actually the case, you’ll want to file a Kong bug. The balancer should be able to find a peer if it’s reporting at least one target with status HEALTHY or HEALTHCHECKS_OFF.

Topic		Replies	Views
Error : failure to get a peer from the ring-balancer Questions	3	5123	June 7, 2018
Error: balancers.lua:228: get_balancer(): balancer not found for call-reminder-service.prod-khatabook.80.svc, Questions	1	705	January 31, 2022
Add upstream Invalid by api, balancer not found health check Questions	2	448	October 17, 2019
Does the Load Balancing work correctly? Questions	3	732	May 23, 2018
How to have Error page instead of failure to get peer from ring balancer Questions kubernetes	1	1175	July 29, 2020

Troubleshooting a ring-balancer failure

Related topics