what is the best way to monitory LUA memory on kong, last week all our kong cluster eventually died spiking CPU, I suspect this is due to memory pressure on LUA heap.
Can you provide more information about the cluster setup (config and size) and plugins installed?
3 nodes, t2.large, 2 CPU, 8 GB, each node in separate availability zone, behind ALB
about 10 APIs, 20 upstreams, 3 targets in each upstream, each API uses upstreams names to proxy.
similar set of plugins, like cors, key-auth, acl, we have also custom plugin based on PDK, that allow fallback to different zone, doing set_upstream
And about the traffic?
Do you have logs?
Do you have metrics?
What database are you using?
Are you in debug?
traffic about 10 req/sec
I do have logs, I do have logs and I do have metrics, what specifically do you need?
database is cassandra, multi region
no I’m not in debug as I’m in production, but I probably can spinn-up additional instance with debug option.
what I don’t like about this situation, is that all 3 nodes when unresponsive with 100% CPU in relatively short time. LUA memory is one of my guesses, so wondering how I can monitor it, without additional plugin and
to put more context to this.
in kong log files on all instances ( access, admin_access, error ) there is a gap( no activity ) in 90 min for the duration of CPU spike.
any metrics you choose, I have them all, all is good, except CPU utilization for nginx which took ~50% and in 10 min allocated 100%, nothing was frozen on the instance, I was able to ssh, look around, stop and start kong.
What is the version of Kong.
I´m asking this things because on past i have exactly the same behaviour but after moving everything to the latest version 1.0.1 and moving to Services and Routes all the system has established.
But at the same time i switch from container to dedicated VM´s.
this is kong 0.14.1, we are not ready to upgrade yet, but will probably soon.
also we do use services/routes and dedicated VM ( ec2 )
Just a note, if you’re running on a burstable T2 instance, you need to check to ensure you haven’t burned through all your CPU cycle credits and the node isn’t getting throttled.
I don’t want to mix CPU into this discussion and specifically talk about LUA memory monitoring. Actually that would be a very nice feature for the kong metric, LUA memory size, free/used
Can you describe what specific metrics you want to monitor within Lua? BTW, I don’t think you’re going to have much luck with it, the Lua VM details are hidden away for obvious reason. You could look at the memory allocation made by each worker process from a system perspective, but things like GC details, etc, probably can’t (shouldn’t) be exposed in the context. If you need that kind of detail for troubleshooting, you might want to have a grok of SystemTap and some accompanying tools: https://github.com/openresty/stapxx
this is what we’ve done, just a note, we already using modified statsd plugin, we need it to send dimensions.
logger:send_statsd(“lua.memory”, collectgarbage(“count”) * 1024, logger.stat_types.gauge, nil)
reason, we we trying to identify high CPU usage, and that part of the system is a black box, at least we have some metrics.
looks like this: