Kong-pongo hangs with kong-plugin

Hi,

I am trying to run kong pongo with https://github.com/Kong/kong-plugin and it hangs.

Stopping after installing dependencies for kong-plugin-myplugin 0.1.0-1

Kong version: 1.4.3
[==========] Running tests from scanned files.
[----------] Global test environment setup.
[----------] Running tests from /kong-plugin/spec/myplugin/01-access_spec.lua
[ RUN      ] /kong-plugin/spec/myplugin/01-access_spec.lua @ 54: myplugin: (access) [#postgres] request gets a 'hello-world' header
[       OK ] /kong-plugin/spec/myplugin/01-access_spec.lua @ 54: myplugin: (access) [#postgres] request gets a 'hello-world' header (33.05 ms)
[ RUN      ] /kong-plugin/spec/myplugin/01-access_spec.lua @ 74: myplugin: (access) [#postgres] response gets a 'bye-world' header
[       OK ] /kong-plugin/spec/myplugin/01-access_spec.lua @ 74: myplugin: (access) [#postgres] response gets a 'bye-world' header (2.51 ms)

To me it looks like, the issue is in client:send here https://github.com/Kong/kong-plugin/blob/master/spec/myplugin/01-access_spec.lua#L55. How do I debug this further? Sorry I am a newbie to kong and lua.

This is on macOS catalina btw.

How long did you wait?

With the basic template, what you did is only the Postgres tests, it will repeat those tests, but then with Cassandra. Before the first Cassandra test is ran it will run database migrations, and they are slow. On my system it might take up to 30 seconds, if you have a less power packed system it might be more, or even in a really bad case, maybe the underlying containers run into trouble.

To test it you can try with or without cassandra.

try

# stop everything, and restart without cassandra
pongo down
pongo run --no-cassandra -- --exclude-tags=cassandra

The options before -- are for Pongo, the ones after will be passed to Busted for testing

Thanks @Tieske, I guess the issue is with cassandra. The test runs fine without cassandra. Here are the logs

/kong # cat /kong-plugin/servroot/logs/error.log
2020/02/13 03:25:33 [debug] 42#0: [lua] globalpatches.lua:10: installing the globalpatches
2020/02/13 03:25:33 [debug] 42#0: [lua] globalpatches.lua:243: randomseed(): seeding PRNG from OpenSSL RAND_bytes()
2020/02/13 03:25:33 [debug] 42#0: [lua] globalpatches.lua:269: randomseed(): random seed: 121512341570 for worker nb 0
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:449: init(): [dns-client] (re)configuring dns client
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:454: init(): [dns-client] staleTtl = 4
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:457: init(): [dns-client] validTtl = nil
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:461: init(): [dns-client] noSynchronisation = false
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:480: init(): [dns-client] query order = LAST, SRV, A, CNAME
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:520: init(): [dns-client] adding A-record from 'hosts' file: grpcs_2.test = 127.0.0.1
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:520: init(): [dns-client] adding A-record from 'hosts' file: localhost = 127.0.0.1
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:520: init(): [dns-client] adding A-record from 'hosts' file: grpcs_1.test = 127.0.0.1
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:520: init(): [dns-client] adding A-record from 'hosts' file: custom_localhost = 127.0.0.1
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:520: init(): [dns-client] adding A-record from 'hosts' file: lambda.us-east-1.amazonaws.com = 127.0.0.1
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:579: init(): [dns-client] nameserver 127.0.0.11:53
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:584: init(): [dns-client] attempts = 5
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:593: init(): [dns-client] timeout = 2000 ms
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:597: init(): [dns-client] ndots = 0
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:599: init(): [dns-client] search =
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:605: init(): [dns-client] badTtl = 30 s
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:607: init(): [dns-client] emptyTtl = 1 s
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:123: check_db_against_config(): Discovering used plugins
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:449: init(): [dns-client] (re)configuring dns client
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:454: init(): [dns-client] staleTtl = 4
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:457: init(): [dns-client] validTtl = nil
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:461: init(): [dns-client] noSynchronisation = false
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:480: init(): [dns-client] query order = LAST, SRV, A, CNAME
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:520: init(): [dns-client] adding A-record from 'hosts' file: grpcs_2.test = 127.0.0.1
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:520: init(): [dns-client] adding A-record from 'hosts' file: localhost = 127.0.0.1
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:520: init(): [dns-client] adding A-record from 'hosts' file: grpcs_1.test = 127.0.0.1
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:520: init(): [dns-client] adding A-record from 'hosts' file: custom_localhost = 127.0.0.1
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:520: init(): [dns-client] adding A-record from 'hosts' file: lambda.us-east-1.amazonaws.com = 127.0.0.1
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:579: init(): [dns-client] nameserver 127.0.0.11:53
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:584: init(): [dns-client] attempts = 5
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:593: init(): [dns-client] timeout = 2000 ms
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:597: init(): [dns-client] ndots = 0
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:599: init(): [dns-client] search =
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:605: init(): [dns-client] badTtl = 30 s
2020/02/13 03:25:33 [debug] 42#0: [lua] client.lua:607: init(): [dns-client] emptyTtl = 1 s
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: correlation-id
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: pre-function
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: cors
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: ldap-auth
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: loggly
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: hmac-auth
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:201: loader_fn(): Loading custom plugin entity: 'hmac-auth.hmacauth_credentials'
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: zipkin
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: request-size-limiting
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: azure-functions
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: request-transformer
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: oauth2
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:201: loader_fn(): Loading custom plugin entity: 'oauth2.oauth2_credentials'
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:201: loader_fn(): Loading custom plugin entity: 'oauth2.oauth2_authorization_codes'
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:201: loader_fn(): Loading custom plugin entity: 'oauth2.oauth2_tokens'
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: response-transformer
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: ip-restriction
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: statsd
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: jwt
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:201: loader_fn(): Loading custom plugin entity: 'jwt.jwt_secrets'
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: proxy-cache
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: basic-auth
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:201: loader_fn(): Loading custom plugin entity: 'basic-auth.basicauth_credentials'
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: key-auth
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:201: loader_fn(): Loading custom plugin entity: 'key-auth.keyauth_credentials'
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: http-log
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: myplugin
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: datadog
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: tcp-log
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: rate-limiting
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: post-function
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: prometheus
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: acl
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:201: loader_fn(): Loading custom plugin entity: 'acl.acls'
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: kubernetes-sidecar-injector
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: syslog
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: file-log
2020/02/13 03:25:33 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: udp-log
2020/02/13 03:25:34 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: response-ratelimiting
2020/02/13 03:25:34 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: aws-lambda
2020/02/13 03:25:34 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: session
2020/02/13 03:25:34 [debug] 42#0: [lua] plugins.lua:201: loader_fn(): Loading custom plugin entity: 'session.sessions'
2020/02/13 03:25:34 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: bot-detection
2020/02/13 03:25:34 [debug] 42#0: [lua] plugins.lua:239: load_plugin(): Loading plugin: request-termination
2020/02/13 03:25:34 [notice] 42#0: using the "epoll" event method
2020/02/13 03:25:34 [notice] 42#0: openresty/1.15.8.2
2020/02/13 03:25:34 [notice] 42#0: built by gcc 9.2.0 (Alpine 9.2.0)
2020/02/13 03:25:34 [notice] 42#0: OS: Linux 4.19.76-linuxkit
2020/02/13 03:25:34 [notice] 42#0: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2020/02/13 03:25:34 [notice] 47#0: start worker processes
2020/02/13 03:25:34 [notice] 47#0: start worker process 48
2020/02/13 03:25:34 [debug] 48#0: *1 [lua] globalpatches.lua:243: randomseed(): seeding PRNG from OpenSSL RAND_bytes()
2020/02/13 03:25:34 [debug] 48#0: *1 [lua] globalpatches.lua:269: randomseed(): random seed: 119157918811 for worker nb 0
2020/02/13 03:25:34 [debug] 48#0: *1 [lua] events.lua:211: do_event_json(): worker-events: handling event; source=resty-worker-events, event=started, pid=48, data=nil
2020/02/13 03:25:34 [notice] 48#0: *1 [lua] cache_warmup.lua:42: cache_warmup_single_entity(): Preloading 'services' into the cache ..., context: init_worker_by_lua*
2020/02/13 03:25:34 [notice] 48#0: *1 [lua] cache_warmup.lua:81: cache_warmup_single_entity(): finished preloading 'services' into the cache (in 41ms), context: init_worker_by_lua*
2020/02/13 03:25:34 [notice] 48#0: *1 [lua] cache_warmup.lua:42: cache_warmup_single_entity(): Preloading 'plugins' into the cache ..., context: init_worker_by_lua*
2020/02/13 03:25:34 [notice] 48#0: *1 [lua] cache_warmup.lua:81: cache_warmup_single_entity(): finished preloading 'plugins' into the cache (in 9ms), context: init_worker_by_lua*
2020/02/13 03:25:34 [debug] 48#0: *2 [lua] init.lua:852: balancer(): setting address (try 1): 127.0.0.1:15555
2020/02/13 03:25:34 [info] 48#0: *2 client 127.0.0.1 closed keepalive connection
2020/02/13 03:25:34 [debug] 48#0: *8 [lua] init.lua:852: balancer(): setting address (try 1): 127.0.0.1:15555
2020/02/13 03:25:34 [info] 48#0: *8 client 127.0.0.1 closed keepalive connection
2020/02/13 03:25:34 [notice] 47#0: signal 15 (SIGTERM) received from 56, exiting
2020/02/13 03:25:34 [notice] 48#0: exiting
2020/02/13 03:25:34 [notice] 48#0: exit
2020/02/13 03:25:34 [notice] 47#0: signal 17 (SIGCHLD) received from 48
2020/02/13 03:25:34 [notice] 47#0: worker process 48 exited with code 0
2020/02/13 03:25:34 [notice] 47#0: exit
/kong # cat /kong-plugin/servroot/logs/access.log
127.0.0.1 - - [13/Feb/2020:03:25:34 +0000] "GET /request HTTP/1.1" 200 939 "-" "lua-resty-http/0.14 (Lua) ngx_lua/10015"
127.0.0.1 - - [13/Feb/2020:03:25:34 +0000] "GET /request HTTP/1.1" 200 939 "-" "lua-resty-http/0.14 (Lua) ngx_lua/10015"
127.0.0.1 - - [13/Feb/2020:03:25:34 +0000] "GET /request HTTP/1.1" 200 939 "-" "lua-resty-http/0.14 (Lua) ngx_lua/10015"
127.0.0.1 - - [13/Feb/2020:03:25:34 +0000] "GET /request HTTP/1.1" 200 939 "-" "lua-resty-http/0.14 (Lua) ngx_lua/10015"

Is there a place where cassandra migrations are logged and I can see its progress?

@Tieske Another interesting observation. The issue only happens in my home network and not when I am in office. Also, if I turn off my wifi at home, it works! What is happening here?

you can do the following;

# restart env to start with clean DB
pongo down
pongo up
pongo shell
$ export KONG_DATABASE=cassandra
$ kong migrations up
$ kong start

So, the hang is actually at resolving cassandra contact point.

2020/02/13 23:30:49 [debug] resolved Cassandra contact point 'cassandra' to: 172.23.0.3

I am further trying to debug why it happens only under certain scenarios as I described earlier.

@Tieske The exact line where this hang is this:

hang is in socket receive here

Also, the default value of 2000 is taken as socket timeout. It’s unit is in milliseconds, but when I debugged, it appears to be seconds and hence the socket connection is not timing out. I tried overriding it with a value of 2 instead of 2000, the wait was then 2 seconds for each socket connection and the hang went away as it timeout out now.

thanks for the thorough research on this. This seems like a combination of issues. Apparently your network setup deviates from expectations, since depending on the network you’re in it either works or doesn’t.

But the timeout issue you pinpointed seems to be a bug that exacerbates the network problem.

So why does the timeout have a mismatch? Kong runs on top of OpenResty. The OpenResty socket implementation is supposed to be compatible with the traditional LuaSocket library, but the timeouts were at some point specified in milliseconds, where the original does it in seconds.

So why is LuaSocket relevant? because in some contexts the OpenResty sockets are unavailable, and in those cases Kong will fall back on LuaSocket for socket operations.

So Kong’s DNS library expects OpenResty sockets, and sets the timeout in milliseconds, in some cases it gets a LuaSocket socket and then it accidentally sets the timeout to 2000 seconds, where 2000 milli-seconds were intended.

So here’s a patch: https://github.com/Kong/kong/pull/5571/files

Can you try whether that fixes the issue?

# restart env to start with clean DB
pongo down
pongo up
pongo shell
$ apk add nano
$ nano /usr/local/share/lua/5.1/kong/db/strategies/cassandra/connector.lua

# manually apply the patch, and save the file

$ export KONG_DATABASE=cassandra
$ kong migrations up
$ kong start

Thanks for the prompt fix @Tieske. I can confirm that the above patch works after kong migrations bootstrap. Would this fix be backported to 1.4, 1.5 and 2.0 versions?

I don’t think there will be a backport of this fix, considering it only shows on top of another problem (a network issue)

But if there’s going to be another 2.0.x release, it might make it in there