Kong kubernetes pod does not start after system restart

#1
  1. Installed kong api gateway on my laptop’s docker-for-desktop k8s cluster
  2. Works well
  3. k8s cluster is restarted, k8s get up and postgres pod starts successfully
  4. However kong pod does not start. It shows ‘Terminated: error’ status
  5. Deleted the pod, and k8s tried to create new pod. But it goes in to ‘Waiting: PodInitializing’ status forever.

Summary - after cluster restart kong pod does not start successfully

Please help, if anyone experience this issue before.

0 Likes

#2

Hi @Prashant_Shandilya

  1. How have you installed Kong on Kubernetes? Could you share your deployment spec?
  2. Did you check the logs of the pods which failed to come up?
0 Likes

#3

Hi @hbagdi
Please refer below screen shot of my k8s deployment.

I did check the logs, its read as below

container “kong” in pod “bm-kongv1-kong-d55969f-x542v” is waiting to start: PodInitializing

I was able to reproduce the issue consiatntly for another inastance.

  1. Installed kong with

helm install name=kg stable/kong

  1. It installed kong gateway, everything was up and running.
  2. Restarted docker-desktop
  3. All other k8s services except for kong service started correctly.

Logs -

2019/04/09 09:15:51 [notice] 1#0: using the “epoll” event method

2019/04/09 09:15:51 [notice] 1#0: openresty/1.13.6.2

2019/04/09 09:15:51 [notice] 1#0: built by gcc 6.3.0 (Alpine 6.3.0)

2019/04/09 09:15:51 [notice] 1#0: OS: Linux 4.9.125-linuxkit

2019/04/09 09:15:51 [notice] 1#0: getrlimit(RLIMIT_NOFILE): 1048576:1048576

2019/04/09 09:15:51 [notice] 1#0: start worker processes

2019/04/09 09:15:51 [notice] 1#0: start worker process 37

2019/04/09 09:15:51 [notice] 1#0: start worker process 38

10.1.0.1 - - [09/Apr/2019:09:15:51 +0000] “GET /status HTTP/1.1” 200 205 “-” “kube-probe/1.10”

10.1.0.1 - - [09/Apr/2019:09:16:01 +0000] “GET /status HTTP/1.1” 200 205 “-” “kube-probe/1.10”

10.1.0.1 - - [09/Apr/2019:09:16:04 +0000] “GET /status HTTP/1.1” 200 205 “-” “kube-probe/1.10”

10.1.0.1 - - [09/Apr/2019:09:16:11 +0000] “GET /status HTTP/1.1” 200 205 “-” “kube-probe/1.10”

10.1.0.1 - - [09/Apr/2019:09:16:21 +0000] “GET /status HTTP/1.1” 200 205 “-” “kube-probe/1.10”

10.1.0.1 - - [09/Apr/2019:09:16:31 +0000] “GET /status HTTP/1.1” 200 205 “-” “kube-probe/1.10”

10.1.0.1 - - [09/Apr/2019:09:16:34 +0000] “GET /status HTTP/1.1” 200 205 “-” “kube-probe/1.10”

10.1.0.1 - - [09/Apr/2019:09:16:41 +0000] “GET /status HTTP/1.1” 200 205 “-” “kube-probe/1.10”

10.1.0.1 - - [09/Apr/2019:09:16:51 +0000] “GET /status HTTP/1.1” 200 207 “-” “kube-probe/1.10”

192.168.65.3 - - [09/Apr/2019:09:16:51 +0000] “GET / HTTP/1.1” 200 5567 “-” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36”

192.168.65.3 - - [09/Apr/2019:09:16:52 +0000] “GET /favicon.ico HTTP/1.1” 404 23 “https://localhost:31713/” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36”

10.1.0.1 - - [09/Apr/2019:09:17:01 +0000] “GET /status HTTP/1.1” 200 208 “-” “kube-probe/1.10”

10.1.0.1 - - [09/Apr/2019:09:17:04 +0000] “GET /status HTTP/1.1” 200 208 “-” “kube-probe/1.10”

2019/04/09 09:17:10 [notice] 37#0: signal 15 (SIGTERM>) received, exiting

0 Likes

#4

What’s the output if you describe the pod?

0 Likes

#5

Name: kg-kong-6cc76cdcb9-xdlbr
Namespace: kg
Node: docker-for-desktop/192.168.65.3
Start Time: Tue, 09 Apr 2019 14:42:41 +0530
Labels: app=kong
component=app
pod-template-hash=2773278765
release=kg
Annotations:
Status: Pending
IP: 10.1.1.147
Controlled By: ReplicaSet/kg-kong-6cc76cdcb9
Init Containers:
wait-for-db:
Container ID: docker://71dc0fab06deb5ce555b5fc9b788bb642a4939f220d0989dc596de0ae1b92347
Image: kong:1.0.2
Image ID: docker-pullable://kong@sha256:555863cf0b3cfae8fc9265f8dd36f0db30fafc0ac7791be0c29f70f8c9b130e8
Port:
Host Port:
Command:
/bin/sh
-c
until kong start; do echo ‘waiting for db’; sleep 1; done; kong stop
State: Running
Started: Tue, 09 Apr 2019 14:51:07 +0530
Ready: False
Restart Count: 1
Environment:
KONG_PROXY_ACCESS_LOG: /dev/stdout
KONG_ADMIN_ACCESS_LOG: /dev/stdout
KONG_PROXY_ERROR_LOG: /dev/stderr
KONG_ADMIN_ERROR_LOG: /dev/stderr
KONG_PG_HOST: kg-postgresql
KONG_PG_PORT: 5432
KONG_PG_PASSWORD: <set to the key ‘postgresql-password’ in secret ‘kg-postgresql’> Optional: false
KONG_DATABASE: postgres
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wqfg6 (ro)
Containers:
kong:
Container ID: docker://d8104d677642e1c8f002405fb86e4316016f277932447dc2ce702c095052803a
Image: kong:1.0.2
Image ID: docker-pullable://kong@sha256:555863cf0b3cfae8fc9265f8dd36f0db30fafc0ac7791be0c29f70f8c9b130e8
Ports: 8444/TCP, 8000/TCP, 8443/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
State: Terminated
Reason: Error
Exit Code: 255
Started: Tue, 09 Apr 2019 14:45:12 +0530
Finished: Tue, 09 Apr 2019 14:49:25 +0530
Ready: False
Restart Count: 0
Liveness: http-get https://:admin/status delay=30s timeout=5s period=30s #success=1 #failure=5
Readiness: http-get https://:admin/status delay=30s timeout=1s period=10s #success=1 #failure=5
Environment:
KONG_ADMIN_LISTEN: 0.0.0.0:8444 ssl
KONG_PROXY_LISTEN: 0.0.0.0:8000,0.0.0.0:8443 ssl
KONG_NGINX_DAEMON: off
KONG_PROXY_ACCESS_LOG: /dev/stdout
KONG_ADMIN_ACCESS_LOG: /dev/stdout
KONG_PROXY_ERROR_LOG: /dev/stderr
KONG_ADMIN_ERROR_LOG: /dev/stderr
KONG_DATABASE: postgres
KONG_PG_HOST: kg-postgresql
KONG_PG_PORT: 5432
KONG_PG_PASSWORD: <set to the key ‘postgresql-password’ in secret ‘kg-postgresql’> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wqfg6 (ro)
Conditions:
Type Status
Initialized False
Ready False
PodScheduled True
Volumes:
default-token-wqfg6:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-wqfg6
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: [quote=“hutchic, post:4, topic:3220, full:true”]
What’s the output if you describe the pod?
[/quote]

0 Likes

#6

To narrow down to exact root cause, i tried out several permutations -

  1. Scale down - scale up kong pod: from 0 to 3, works
  2. Restart k8s cluster works
  3. Restart ‘docker for desktop’ does not work and even not able to recover with scale up - scale down
0 Likes

#7

Based on that snippet it looks like k8s hasn’t told Kong to start because the initContainer hasn’t triggered it(?)

What logs show up in that initContainer. Can you exec into it and determine why it’s hung?

0 Likes

#8

Error log -

waiting for db

database needs bootstrapping; run ‘kong migrations bootstrap’

Error: /usr/local/share/lua/5.1/kong/cmd/start.lua:50: nginx: [error] init_by_lua error: /usr/local/share/lua/5.1/kong/init.lua:281: database needs bootstrap; run ‘kong migrations bootstrap’

stack traceback:

[C]: in function ‘error’

/usr/local/share/lua/5.1/kong/init.lua:281: in function ‘init’

init_by_lua:3: in main chunk

ua:3: in main chunk

2019-04-10T07:35:37.237272700Z

2019-04-10T07:35:37.237277900Z

Run with --v (verbose) or --vv (debug) for more details

waiting for db

0 Likes

#9

very interesting. Continuing down this rabbit hole while exec'd into that container can you connect and introspect the database. Are all the prerequisite databases / tables present?

0 Likes

#10

Tried it setup on k8s set up on ubuntu (WAS EC2). Had stability issue there as well :frowning:

0 Likes

#11

With the same symptoms? ie

0 Likes

#12

I had suspected that Docker for Desktop’s Kubernetes implementation might have a bug in your previous case, but since this is possible on EKS (Are you using EKS?) is very odd.

I myself have Kong running in a GKE cluster and that doesn’t seem to have this problem (yet).

I’d like to point out that there are two separate problems here:

  • The first error of database needing bootstrap means, that the Postgres Pod’s backing store had a problem and a database reset happened somehow.
  • The problem you see on AWS, is a different error of a particular migration missing.

Could you check your Postgres deployment?
Were there any pod restarts for Postgres around the time Kong started failing?

As @hutchic points out, could you please log into the Psql DB and list the tables that are in the database. Additionally please paste the content of schema and schema_meta tables into a Github Gist and paste the URL here.

1 Like