Hi everyone,
running Kong 2.0.4 on GKE version 1.16 deployed with helm chart kong-1.8.0. I am experiencing an issue: about once a week, I neek to retrigger the deployment of Kong (by adding an annotation or whatever) because all routes stop responing. In GCP Load Balancer logs i see lots of 502 “failed_to_pick_backend”.
As far as I know, this happens because the health check finds no healthy endpoints (or pods, in this case, since it is using NEGs). At the moment I see that 2 over 25 network endpoints are marked with a red dot in the health status. Despite this, if I do a kong health
inside the proxy container of those pods, kong states that everything is ok.
I also noticed that if I do a:
kubectl port-forward pod/release-kong-XXXXXXX-XXXX 8000:8000
and then a:
curl -v 127.0.0.1:8000/status
(which is the liveness probe configured in the deployment by the helm chart)
… I receive a 404 with no payload in pods marked green in the NEG but with a json payload “message: no Route matched with those values” in those marked as failed.
Remarkably, if I do a kubectl logs release-kong-XXXXX-XXX -c proxy, all healthchecks coming from the 130.211.0.0/16 network are reporting a 200 OK (how is it possible? I get a 404), while in the failing pods i get a 404.
Now, if it might help, this is my values.yaml file:
ingressController:
installCRDs: false
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 500m
memory: 500Mi
image:
repository: myrepo/kong-with-oidc
tag: "v2.0"
replicaCount: 1
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: "app.kubernetes.io/name"
operator: "In"
values:
- "kong"
topologyKey: "kubernetes.io/hostname"
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 1000m
memory: 1Gi
env:
nginx_proxy_keepalive_timeout: "620s"
nginx_proxy_proxy_max_temp_file_size: "10240m"
trusted_ips: "0.0.0.0/0"
real_ip_header: X-Forwarded-For
real_ip_recursive: "on"
plugins: oidc,bundled
proxy:
enabled: true
type: NodePort
externalTrafficPolicy: Local
http:
enabled: true
servicePort: 80
containerPort: 8000
tls:
enabled: false
ingress:
enabled: true
tls:
- hosts:
- api.mydomain.host
secretName: api-host-production-tls
- hosts:
- internal.mydomain.host
secretName: internal-host-production-tls
annotations:
kubernetes.io/ingress.class: gce
acme.cert-manager.io/http01-edit-in-place: "true"
cert-manager.io/cluster-issuer: production
external-dns.alpha.kubernetes.io/hostname: api.mydomain.host,internal.mydomain.host
external-dns.alpha.kubernetes.io/ttl: "200"
path: /*
autoscaling:
enabled: true
minReplicas: 1
maxReplicas: 30
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: External
external:
metric:
name: loadbalancing.googleapis.com|https|request_count
selector:
matchLabels:
resource.labels.url_map_name: k8s2-um-9g4kep5s-XXXXXXXXXXXXXXq
target:
type: AverageValue
value: 50
the base image is as simple as:
FROM kong:2.0
USER root
RUN apk --update --no-cache add git unzip && \
luarocks install kong-oidc && \
apk del git unzip
USER kong
Am I doing something totally wrong? I wonder if it might help adding a fake api behind /status (but still health check shoud be something peformed inside kong itself) or exposing the 8100 port (but then how do I instruct the NEG healthcheck to perform checks on that port instead of 8000?).
Just wondering what would your approach be.
Thanks!