what is the best way to monitory LUA memory on kong, last week all our kong cluster eventually died spiking CPU, I suspect this is due to memory pressure on LUA heap.
Can you provide more information about the cluster setup (config and size) and plugins installed?
3 nodes, t2.large, 2 CPU, 8 GB, each node in separate availability zone, behind ALB
about 10 APIs, 20 upstreams, 3 targets in each upstream, each API uses upstreams names to proxy.
similar set of plugins, like cors, key-auth, acl, we have also custom plugin based on PDK, that allow fallback to different zone, doing set_upstream
And about the traffic?
Do you have logs?
Do you have metrics?
What database are you using?
Are you in debug?
traffic about 10 req/sec
I do have logs, I do have logs and I do have metrics, what specifically do you need?
database is cassandra, multi region
no I’m not in debug as I’m in production, but I probably can spinn-up additional instance with debug option.
what I don’t like about this situation, is that all 3 nodes when unresponsive with 100% CPU in relatively short time. LUA memory is one of my guesses, so wondering how I can monitor it, without additional plugin and
to put more context to this.
in kong log files on all instances ( access, admin_access, error ) there is a gap( no activity ) in 90 min for the duration of CPU spike.
any metrics you choose, I have them all, all is good, except CPU utilization for nginx which took ~50% and in 10 min allocated 100%, nothing was frozen on the instance, I was able to ssh, look around, stop and start kong.
What is the version of Kong.
I´m asking this things because on past i have exactly the same behaviour but after moving everything to the latest version 1.0.1 and moving to Services and Routes all the system has established.
But at the same time i switch from container to dedicated VM´s.
this is kong 0.14.1, we are not ready to upgrade yet, but will probably soon.
also we do use services/routes and dedicated VM ( ec2 )
Just a note, if you’re running on a burstable T2 instance, you need to check to ensure you haven’t burned through all your CPU cycle credits and the node isn’t getting throttled.
I don’t want to mix CPU into this discussion and specifically talk about LUA memory monitoring. Actually that would be a very nice feature for the kong metric, LUA memory size, free/used