Kubernetes + Kong DB-less + Canary (Flagger)

Arnaud_Hatzenbuhler · December 10, 2020, 4:48pm

Hello !

We are trying to understand some behavior of Kong DB-less mode.

We currently have :

Amazon EKS 1.16
Kong 2.0
Kong Ingress Controller 0.8.1
Flagger (Kubernetes Operator to automate Canary releases)
Linkerd 2.7.1 (Service Mesh to enable Canary deployement in combination with Flagger)
SM v1
SM v2

Note : SM is our microservice

To deploy our application, we are using Helm.
We are deploying Kong and SM in 2 differents Helm releases.
Our SM chart contains everything (Deployment, Service …) + the Ingress declaration

Our use case :

SM v1 deployed and we are sending traffic to our micro-service
Update SM Helm release to have SM Helm v2.
At this point, we will have 2 versions of the micro-services, and the Ingress declaration hasn’t changed
Observe results

Our results :
Before the update, everything is fine, all requests are responding 200.
During the update, we have a small time window (~4-5 seconds) where we have errors
After that time window, everything is fine.

When we are reading the logs, we observe this in the logs :
Proxy logs: proxy.log - Pastebin.com
Ingress Controller logs: I1210 15:48:57.396990 1 kong.go:58] no configuration change, skipping sync - Pastebin.com

This time window seems to correspond with the Kong configuration update.

My questions are :

Have we understood correctly the behavior of Kong/Kong Ingress Controller ?
Do we have this behavior because of the DB-less ?
How to avoid it ?

Thanks in advance !

traines · December 10, 2020, 11:34pm

Is your KONG_NGINX_WORKER_PROCESSES environment variable set to 2 or higher? Older defaults had it set to 1, which we determined to cause issues around the time of config updates.

If that doesn’t clear the issue, are those 503s definitely coming from Kong, or do they appear in the application logs as well? If they are coming from Kong, they should indicate DNS resolution failures, which usually means that no Pods providing that service are ready yet. However, normal rollouts in Kubernetes shouldn’t bring down the existing Pods or update the DNS listing until a sufficient number of new replicas become ready.

Arnaud_Hatzenbuhler · December 11, 2020, 10:09am

For the environment variable KONG_NGINX_WORKER_PROCESSES, we have set it to auto.
Currently, we don’t have logs for a 503 error in your SM log.

It’s not a “normal” rollout, by that I mean, it’s not only 1 deployment.
To perform a Canary release we have 2 deployments.
We have the a deployment “primary”, which don’t change, and a deployment “canary” which will be updated.

Then we have a TrafficSplit, managed by Linkerd, that will handle the weight distribution between the 2 services, primary and canary. So technically, Kong will contact a service without knowing what “sub-services” it will contact.

traines · December 15, 2020, 5:40pm

With Linkerd in the mix, perhaps you need to enable the service-upstream annotation on those services? Not sure exactly how the apparent DNS failure is occurring still, but you usually should use that annotation regardless if you have sidecars from a mesh proxy in charge of routing decisions.

Arnaud_Hatzenbuhler · December 18, 2020, 9:00am

Hello !

We have added ingress.kubernetes.io/service-upstream: "true" to our services, it’s works
Thank you

Topic		Replies	Views
Performance issue with kong-ingress service-mesh , kong-gateway	8	3145	August 19, 2021
Kong fails to create balancer for upstream Questions	4	4402	November 29, 2019
Very little documentation for DB version of Kong with Kubernetes Questions kubernetes	4	1293	September 23, 2021
Kong 1.4, K8S, DB-less, 504 Gateway timeout kubernetes , kong-gateway	19	6336	January 23, 2020
Share a DB across ingresses kong 0.36-2 Questions kubernetes	1	524	June 23, 2020

Kubernetes + Kong DB-less + Canary (Flagger)

Related topics