Liveness check is failing for ingress-controller in Kong pod hosted on AWS EKS

Here are the pod event logs:

Events:
  Type     Reason     Age                   From     Message
  ----     ------     ----                  ----     -------
  Normal   Killing    60m (x2 over 61m)     kubelet  Container ingress-controller failed liveness probe, will be restarted
  Normal   Pulling    36m (x13 over 61m)    kubelet  Pulling image "185908212527.dkr.ecr.eu-central-1.amazonaws.com/vendor/docker-kong-ingress-controller:1.0.0"
  Warning  Unhealthy  11m (x63 over 61m)    kubelet  Liveness probe failed: Get http://10.10.4.8:10254/healthz: dial tcp 10.10.4.8:10254: connect: connection refused
  Warning  BackOff    109s (x234 over 59m)  kubelet  Back-off restarting failed container

Ingress-controller pod logs:

-------------------------------------------------------------------------------
Kong Ingress controller
  Release:    1.0.0
  Build:      a34ce92
  Repository: git@github.com:Kong/kubernetes-ingress-controller.git
  Go:         go1.15.2
-------------------------------------------------------------------------------

W0707 18:47:10.631939       1 client_config.go:608] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2021-07-07T18:47:10Z" level=info msg="version of kubernetes api-server: 1.17+" api-server-host="https://172.20.0.1:443" git_commit=c5067dd1eb324e934de1f5bd4c593b3cddc19d88 git_tree_state=clean git_version=v1.17.17-eks-c5067d major=1 minor=17+ platform=linux/amd64
time="2021-07-07T18:47:10Z" level=info msg="kong version: 2.2.0" kong_version=2.2.0
time="2021-07-07T18:47:10Z" level=info msg="datastore strategy for kong: off"
time="2021-07-07T18:47:10Z" level=info msg="chosen Ingress API version: networking.k8s.io/v1beta1"

If not the solution; just let me know how can I debug this?

Not sure exactly what the ideal debug path is–you may want to try stripping out the liveness check from the deployment altogether and then manually inspecting the container to see if that listen is in fact not coming online. I don’t know of an obvious failure condition that’d happen silently between the last successful event in your logs and when the controller starts the liveness server goroutine at

if there is something that prevents that goroutine from starting, I’d expect an error log, e.g. if it can’t instantiate the controller. With none, not really sure what’s going on–you’ll probably want to inspect the problem instance to see exactly what weird state it’s in and then try to work back to a cause.

FWIW this is the healthz handler, but your issue appears to be before that (the connection is refused entirely, rather than accepted and sending some sort of invalid response). Dunno what’d cause that–possibly some weird gremlin in the EKS network rules? I can’t think of an obvious reason that the listen would silently fail to start.

was there every a solution found for this issue? i’m running kong using helm chart version 2.7.0 and kong runs fine for a while, but then falls into a crashloop with the liveness problem failing. i’m running on AWS EKS 1.21.