Upstreams and failover


from one of the older posts “Failover API handling”

Unfortunately we do not observe this behavior,
We run kong 0.14.1
Our setup:
K1,K2,K3 - three kong instances under the load balancer LB, that expose DNS ‘’

we have upstream U1 configured that has 3 targets with the same weight, T1, T2, T3

we have a plugin on Service, S1, that really does, set_upstream(U1) with the name of configured upstream

we run requests to configured S1, and during this, we terminate service on T1, so T1 returns 502

According to service retry policy, 5, and two remaining healthy instances, intuition, I should not see errors in the client,
but in reality I do see some errors in the client, bug in loadbalancer?



Hmm I believe that active health-checks guarantee a __ # of tx’s fail before marking target down. Passive health-checks are like hit this endpoint every __ number of seconds arbitrarily and if it reports down _ consecutive number of times then mark down. I think you want active health-checks if you want client to only see ___ number of failures before marking it down.

Another important point with the healthcheck lib I haven’t looked into are if its a global or localized to workers. If global you would not see many failures, if its per worker or it takes time for the workers to reach consensus on # of errors then you would see client error count discrepancies even on active healthcheck mode.



I have discovery service that does active monitoring of the servers and Kong plugin that updates upstream/targets based on that, I really don’t want to have yet another active monitor. Even if healthcheck is per worker, still I think logic should not be affected and I believe healthcheck is not related for this.

Suppose I have 3 servers, 100 slots, so according to documentation it will create 100 entries list with 33% distribution and it will use round-robin, so if I happened to hit dead server it should retry on next up to max # of retries, it is done ( should be ) in a context of the same worker. Probability of hitting dead server in this case 5 times in a row very negligible.

Unless I don’t understand how that works.



What errors are you getting? what’s in the Kong logs?



In my test, I have 3 targets, 2 out of 3 targets were returning 500, other returning 200

I do ~ 1M requests, and I see 500 returned to my client, kong list 500 in access logs

my understanding, I should not see any errors, as # of retries I have is 5, and 5 > 3



@likharev please be more detailed in your descriptions, include some log snippets and specifics of the config, the upstream config and or the targets. It is really hard to reason about these things without having the details.



unfortunately I cannot provide this information now, I have to repeat my tests, I’ll do it as soon as I can. What information are you looking for specifically, normal log, debug log, etc ?



please provide a minimal example to reproduce the problem. And logs showing the error message, debug level logs would be nice.

1 Like