Failover API handling

Hi Team,

I have one scenario. When response from upstream API is failed(5XX error code), i need to redirect the request to another backup upstream API. Is this feature available or anything similar to this ?

Does this help?

If not, can you clarify what behavior you are seeking?

Not exactly, the link is talking about health check. In my scenario, i want to redirect to another upstream url if response from first is failed.
Normall scenario :
client - > kong redirects to url1 upstream - > upstream api response is success-> kong redirect response to client
My scenario which i am interested is below:
client - > kong redirects to url1 upstream - > upstream api response is error(code 5XX) -> kong should redirect request to url2 now -> final response of url2 to client

Actually we do have option to configure backup/failover url in WSO2. We want to migrate from wso2 to kong

Of course Kong does failovers. Maybe not an explicit failover, but it certainly does cover the functionality.

The main property involved is the retries property of the Service entity. This does exactly what you think it does. The question now is how does Kong select the next upstream service when a failure occurs?

When using simple dns based loadbalancing (without an upstream entity), Kong will do a round-robin on the dns record. So if it is an A-record or SRV record with multiple entries, that’s where it selects the next upstream service.
The catch is that with an SRV record and non-equal weights, multiple tries might end up with the same backend service. Let’s use an example to explain this. Say we have (an extreme) situation with an SRV record containing:

  • name = a.service.local, weight = 1
  • name = b.service.local, weight = 1000

And let’s assume the Service.retries = 5.

In this case if b.service.local returns a 500, we have 5 more tries to go, but due to the weights, 1 vs 1000, there is a big likelihood that each retry will also hit the same b.service.local. Because on DNS records there is no notion of health, Kong will just retry the next one in line, which might actually be the same one. This problem does not occur with equal weights (or with A records since that doesn’t carry any weight info), since every next one is actually a different entry in that case.

All in all this really is a corner case, and if you have a proper set of backend services and a well chosen retries value, this should be of no concern to you.

Slightly more complex is the loadbalancer case (with an upstream entity). In this case it will do the exact same thing, it will select the next entry in the loadbalancer. But here the balancer does have a notion of health. And once an upstream backend service is considered unhealthy (the circuit-breaker tripped after a number of failures), the balancer will not retry that same backend service.

If you set up “passive” healthchecks, each of the failures will count against the health of the backend service. So with a Service.retries = 5 setting, and a passive healthcheck that fails after eg. 3 failures, you’re completely covered.

Does that help?


Thanks Tieske.
I really don’t want health check to come into picture, because once circuit breaker trips all the next client requests will directly forwarded to URL2 in my scenario. So LoadBalancer might not be good option for me.

Regarding DNS based load balancing -
1.Good thing to note here is no health check.
2. Now ,I want always URL1 to be picked first ,so i give more weight and Service.retries = 1
name = url1.service.local, weight = 60
name = url2.service.local, weight = 50
So as per your statement if url1 fails once, it will pick up url2 next. Is my understanding correct here ?

i cant give equal weights to both, since i always need kong to hit url1 on first go.

If these are your constraints, then you might want to reconsider your setup. This is rather brittle imho. I think modern infrastructures like Kubernetes don’t even support a setup like that.

Kong cannot do this.

Your example with dns based LB, with weight 60 and 50 also doesn’t work. Kong will reduce the numbers to their minimum equivalents, in this case 6 and 5. From there is will create a randomized list with 6+5 = 11 entries. And then it will perform a round-robin on that. The round-robin pointer will only be reset when the dns record expires. So on some cases it will simply start with the backup url.

The proper way to do this would be to implement the priority field of SRV records. But since we’ve never had a request for that Kong doesn’t support it (it only uses the entries with the highest precedence from the SRV record).

That said, since the world is moving to more dynamic setups, with Kubernetes and the likes, I think you should consider to make your failover instance a full instance and not just a backup. This will make your infra more future proof.

i got your point. But my case is no way related to LB.
Infact my url1 is cache layer and url2 is actual application.
So i must hit cache layer first, when response is not cached in cache layer I must hit application.

ah, thanks for explaining that.

Maybe this can help:

btw: Kong only redirects to a different IP/port combo. Whilst you are mentioning url’s to try. Can you give me a bit of a better description of how it works for you, with urls, and other config info? maybe a full config (anonymized) example?

I’m interested in the caching use case.

Simple. If we take WSO2 gateway, while configuring the endpoint/upstream urls , they have option to enter failover/backup url. So we have used cache URL as primary URL and actual application url as failover url.

So when response is not present in cache layer, gateway redirects the request to failover url i.e application url. Note here both are hosted on different servers(cache and application). It would be good if have optional header like failoverurl while configuring service. Then we could have registered both main and backup urls at one go.

I have gone through ur link where kong is used to cache the response. But we have our own cache layer hosted and we should be using that.

How do you populate the cache?

What I get from it now is that you hit the cache layer, if it fails you hit the application/site. Now upon the next request for the same resource the same thing happens again, unless, somewhere in between you populate the cache…

Also how do you deal with the latency? caching is propably ok, but every request to your backend will be delayed by the initial failing call to the caching layer. How does that add up?

Cache layer will be updated by db sync.Thats different process through batch job. It is not required have always response from cache. And latency for over all cycle(cache and application) is well below 4 secs so it will not be an issue.

can we implement this by customized plugin ?

Do we have any suggestion on this how to proceed ?

The closest thing to this in Kong I think would be using upstream+targets and setting your priority on the cache vip to be crazyy high so it takes the brunt of the % of the transactions. But with passive healthchecks + active(to help auto re-neable the primary cache vip when its healthy again) the you can then achieve the 5xx error to trigger that high priority upstream to fail, thus falling over to the failover url app service vip. It won’t be 100% what you want because the failover url will still get called some in general(but imo its close enough to what you want). But its how you could do it without learning to write a more complex plugin(which anything can be achieved with kong plugins but I don’t know how to build it out entirly without playing with code myself). Its an extremely rare use case too I feel like.

1 Like

Did you succeed in configuring that scenario? As far as I understand this is not the Kong behaviour. It helps in failover based on passive healtchecks that mark the node as unhealthy base on HTTP status code, nothing more. I mean, a 500 request will not be redirected to the other node, it is lost.
Am I right?

This was the solution that worked for me, with a similar requirement for primary and backup service.

the suggested design will only send 1/1000 request to backup server instead of primary, which resonable in my case.

Another useful article was, Upstreams and failover

Here are my configurations,

_format_version: '1.1'

- name: my-service
  host: my-upstream
  port: 8000
  protocol: http
  retries: 3
  - name: my-route
    - /

- name: my-upstream
  - target:
    weight: 1000
  - target:
    weight: 1
      concurrency: 2
        - 200
        - 302
        interval: 0
        successes: 1
      http_path: /testKong
      timeout: 1
      type: http
        http_failures: 3
        - 429
        - 404
        - 500
        - 501
        - 502
        - 503
        - 504
        - 505
        interval: 1
        tcp_failures: 3
        timeouts: 3
        - 200
        - 201
        - 202
        - 203
        - 204
        - 205
        - 206
        - 207
        - 208
        - 226
        - 300
        - 301
        - 302
        - 303
        - 304
        - 305
        - 306
        - 307
        - 308
        successes: 1
      type: http
        http_failures: 1
        - 429
        - 500
        - 503
        tcp_failures: 1
        timeouts: 1
  slots: 1000
1 Like

Hi, Is there way to implement similar config with Kong ingress controller for K8s ?

Thank you