Frequent "building a new plugins iterator"

We’ve got an intermittent issue that causes Kong API Gateway to stop responding to requests. The root cause is not clear at this point, but we made a few observations that could help in the troubleshooting. Firstly, this is our environment:

  • Kong 3.3.0 in DB-less mode
  • KIC 2.8.1
  • Python Kong-PDK 0.33
  • k8s 1.24.12
  • Kong Helm Chart 2.23.0

Everything seems to be running fine from the API perspective, but Kong logs are full of messages:

declarative reconfigure was started on worker #0
[DB cache] purging (local) cache
building a new plugins iterator

AFAIU reconfiguration could be triggered by k8s infrastructure changes, but what is unclear is why it’s purging the whole cache which results is rebuilding plugins iterator. This can happen a dozen of times in a single second and occasionally Kong may become completely unresponsive and when it happens, we are starting to see:

Could not claim instance_id for {{PLUGIN_NAME}} (key: {{PLUGIN_ID}})

Memory and CPU usage is stable and below 50%.

Any idea what could be the root cause? What k8s changes trigger reconfigure?

What could make kong/kong/runloop/handler.lua at master · Kong/kong · GitHub turn true given that plugins never get changed?

It turns out that if declarative configuration has changed, all caches will be purged and previous information about plugins will be invalidated. This leads to new plugin instances to be created. This thing is that our upstream services run on spot instances and their IPs change pretty often that leads to updates to declarative configuration, however plugins never get changed and it does not make sense to reload them every time. The problem that we are observing when one of the pods becomes unresponsive (it gets stuck on Could not claim instance_id) could be mitigated by avoiding frequent plugin reloads - it makes sense to reload plugins only when plugins hash has changed.

What could make https://github.com/Kong/kong/blob/master/kong/runloop/handler.lua#L640 turn true given that plugins never get changed?

Answering myself. plugins_hash comes always nil from /config?check_hash=1 and plugins iterator gets rebuilt on every reconfiguration.

A potential issue could be that reset_instance is called only for not ready or no plugin instance. If any other error occurs, non-initialized plugin instance will not be cleaned up, and other threads will not make through the while loop in get_instance_id.

I made a bug report Could not claim instance_id for {{PLUGIN_NAME}} (key: {{PLUGIN_ID}}) · Issue #11173 · Kong/kong · GitHub.