Kong gateway is intermittently down

Our on-premise kong is not responding for few seconds to mininutes on frequet basics.
Even prometheus is not able to scrap metrics at that time. And prometheus is marking the service as down(integrated prometheus plugin in kong).
We didn’t see any abnormal activity / hike in requests count / latencies etc.
No spike in CPU/Memory.
No logs in error.log

How to debug this further ?
is there way to see what is happening at that time ? which logs/metrics can help us here ?

Our kong setup architecture

  • On premise setup from Open source version backed by postgres.
  • Adding/Deleting routes using Admin API dynamically(its continuous thing)
  • Using route plugins to do some routing for few routes. Like rewriting upstream/url etc.
  • No changes in default kong configurations/nginx configuration mostly.

We identified the issue.
It was due to os.execute(sleep 60), causing worker to accept…