running Kong 2.0.4 on GKE version 1.16 deployed with helm chart kong-1.8.0. I am experiencing an issue: about once a week, I neek to retrigger the deployment of Kong (by adding an annotation or whatever) because all routes stop responing. In GCP Load Balancer logs i see lots of 502 “failed_to_pick_backend”.
As far as I know, this happens because the health check finds no healthy endpoints (or pods, in this case, since it is using NEGs). At the moment I see that 2 over 25 network endpoints are marked with a red dot in the health status. Despite this, if I do a
kong health inside the proxy container of those pods, kong states that everything is ok.
I also noticed that if I do a:
kubectl port-forward pod/release-kong-XXXXXXX-XXXX 8000:8000
and then a:
curl -v 127.0.0.1:8000/status
(which is the liveness probe configured in the deployment by the helm chart)
… I receive a 404 with no payload in pods marked green in the NEG but with a json payload “message: no Route matched with those values” in those marked as failed.
Remarkably, if I do a kubectl logs release-kong-XXXXX-XXX -c proxy, all healthchecks coming from the 188.8.131.52/16 network are reporting a 200 OK (how is it possible? I get a 404), while in the failing pods i get a 404.
Now, if it might help, this is my values.yaml file:
image: repository: myrepo/kong-with-oidc tag: "v2.0" replicaCount: 1 affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: "app.kubernetes.io/name" operator: "In" values: - "kong" topologyKey: "kubernetes.io/hostname" resources: limits: cpu: 1000m memory: 1Gi requests: cpu: 1000m memory: 1Gi env: nginx_proxy_keepalive_timeout: "620s" nginx_proxy_proxy_max_temp_file_size: "10240m" trusted_ips: "0.0.0.0/0" real_ip_header: X-Forwarded-For real_ip_recursive: "on" plugins: oidc,bundled proxy: enabled: true type: NodePort externalTrafficPolicy: Local http: enabled: true servicePort: 80 containerPort: 8000 tls: enabled: false ingress: enabled: true tls: - hosts: - api.mydomain.host secretName: api-host-production-tls - hosts: - internal.mydomain.host secretName: internal-host-production-tls annotations: kubernetes.io/ingress.class: gce acme.cert-manager.io/http01-edit-in-place: "true" cert-manager.io/cluster-issuer: production external-dns.alpha.kubernetes.io/hostname: api.mydomain.host,internal.mydomain.host external-dns.alpha.kubernetes.io/ttl: "200" path: /* autoscaling: enabled: true minReplicas: 1 maxReplicas: 30 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 80 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 - type: External external: metric: name: loadbalancing.googleapis.com|https|request_count selector: matchLabels: resource.labels.url_map_name: k8s2-um-9g4kep5s-XXXXXXXXXXXXXXq target: type: AverageValue value: 50
the base image is as simple as:
FROM kong:2.0 USER root RUN apk --update --no-cache add git unzip && \ luarocks install kong-oidc && \ apk del git unzip USER kong
Am I doing something totally wrong? I wonder if it might help adding a fake api behind /status (but still health check shoud be something peformed inside kong itself) or exposing the 8100 port (but then how do I instruct the NEG healthcheck to perform checks on that port instead of 8000?).
Just wondering what would your approach be.