Upstreams and failover


from one of the older posts “Failover API handling”

Unfortunately we do not observe this behavior,
We run kong 0.14.1
Our setup:
K1,K2,K3 - three kong instances under the load balancer LB, that expose DNS ‘’

we have upstream U1 configured that has 3 targets with the same weight, T1, T2, T3

we have a plugin on Service, S1, that really does, set_upstream(U1) with the name of configured upstream

we run requests to configured S1, and during this, we terminate service on T1, so T1 returns 502

According to service retry policy, 5, and two remaining healthy instances, intuition, I should not see errors in the client,
but in reality I do see some errors in the client, bug in loadbalancer?


Hmm I believe that active health-checks guarantee a __ # of tx’s fail before marking target down. Passive health-checks are like hit this endpoint every __ number of seconds arbitrarily and if it reports down _ consecutive number of times then mark down. I think you want active health-checks if you want client to only see ___ number of failures before marking it down.

Another important point with the healthcheck lib I haven’t looked into are if its a global or localized to workers. If global you would not see many failures, if its per worker or it takes time for the workers to reach consensus on # of errors then you would see client error count discrepancies even on active healthcheck mode.


I have discovery service that does active monitoring of the servers and Kong plugin that updates upstream/targets based on that, I really don’t want to have yet another active monitor. Even if healthcheck is per worker, still I think logic should not be affected and I believe healthcheck is not related for this.

Suppose I have 3 servers, 100 slots, so according to documentation it will create 100 entries list with 33% distribution and it will use round-robin, so if I happened to hit dead server it should retry on next up to max # of retries, it is done ( should be ) in a context of the same worker. Probability of hitting dead server in this case 5 times in a row very negligible.

Unless I don’t understand how that works.


What errors are you getting? what’s in the Kong logs?