Kong on GKE and 502 failed_to_pick_backend

Hi everyone,

running Kong 2.0.4 on GKE version 1.16 deployed with helm chart kong-1.8.0. I am experiencing an issue: about once a week, I neek to retrigger the deployment of Kong (by adding an annotation or whatever) because all routes stop responing. In GCP Load Balancer logs i see lots of 502 “failed_to_pick_backend”.

As far as I know, this happens because the health check finds no healthy endpoints (or pods, in this case, since it is using NEGs). At the moment I see that 2 over 25 network endpoints are marked with a red dot in the health status. Despite this, if I do a kong health inside the proxy container of those pods, kong states that everything is ok.

I also noticed that if I do a:
kubectl port-forward pod/release-kong-XXXXXXX-XXXX 8000:8000

and then a:
curl -v 127.0.0.1:8000/status
(which is the liveness probe configured in the deployment by the helm chart)

… I receive a 404 with no payload in pods marked green in the NEG but with a json payload “message: no Route matched with those values” in those marked as failed.

Remarkably, if I do a kubectl logs release-kong-XXXXX-XXX -c proxy, all healthchecks coming from the 130.211.0.0/16 network are reporting a 200 OK (how is it possible? I get a 404), while in the failing pods i get a 404.

Now, if it might help, this is my values.yaml file:
ingressController:
installCRDs: false
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 500m
memory: 500Mi

    image:
      repository: myrepo/kong-with-oidc
      tag: "v2.0"

    replicaCount: 1

    affinity:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                  - key: "app.kubernetes.io/name"
                    operator: "In"
                    values:
                      - "kong"
              topologyKey: "kubernetes.io/hostname"

    resources:
      limits:
        cpu: 1000m
        memory: 1Gi
      requests:
        cpu: 1000m
        memory: 1Gi

    env:
      nginx_proxy_keepalive_timeout: "620s"
      nginx_proxy_proxy_max_temp_file_size: "10240m"

      trusted_ips: "0.0.0.0/0"
      real_ip_header: X-Forwarded-For
      real_ip_recursive: "on"
      plugins: oidc,bundled

    proxy:
      enabled: true
      type: NodePort
      externalTrafficPolicy: Local

      http:
        enabled: true
        servicePort: 80
        containerPort: 8000

      tls:
        enabled: false

      ingress:
        enabled: true
        tls:
          - hosts:
            - api.mydomain.host
            secretName: api-host-production-tls
          - hosts:
            - internal.mydomain.host
            secretName: internal-host-production-tls
        annotations:
          kubernetes.io/ingress.class: gce
          acme.cert-manager.io/http01-edit-in-place: "true"
          cert-manager.io/cluster-issuer: production
          external-dns.alpha.kubernetes.io/hostname: api.mydomain.host,internal.mydomain.host
          external-dns.alpha.kubernetes.io/ttl: "200"
        path: /*

    autoscaling:
      enabled: true
      minReplicas: 1
      maxReplicas: 30
      metrics:
        - type: Resource
          resource:
            name: cpu
            target:
              type: Utilization
              averageUtilization: 80
        - type: Resource
          resource:
            name: memory
            target:
              type: Utilization
              averageUtilization: 80
        - type: External
          external:
            metric:
              name: loadbalancing.googleapis.com|https|request_count
              selector:
                matchLabels:
                  resource.labels.url_map_name: k8s2-um-9g4kep5s-XXXXXXXXXXXXXXq
            target:
              type: AverageValue
              value: 50

the base image is as simple as:

FROM kong:2.0

USER root

RUN apk --update --no-cache add git unzip && \
    luarocks install kong-oidc && \
    apk del git unzip

USER kong

Am I doing something totally wrong? I wonder if it might help adding a fake api behind /status (but still health check shoud be something peformed inside kong itself) or exposing the 8100 port (but then how do I instruct the NEG healthcheck to perform checks on that port instead of 8000?).

Just wondering what would your approach be.

Thanks!

Are you able to resolve the issue?
I am also having same issue, my health check for kong ingress is failing.