Hello, fellow Kong Community members.
We at Dream11 have been (happily) using Kong for a few months now and in the process, we developed a bunch of custom plugins, some of which we have open-sourced. Refer-
kong-circuit-breaker
kong-scalable-rate-limiter
kong-host-interpolate-by-header
We also have a couple of plugins, which we plan to open source, in the pipeline.
We plan to create a “Load Shedding” plugin to increase resiliency at the API Gateway layer. On a high level, the plugin would shed any additional load Kong is unable to handle, thereby protecting Kong from a possible crash by overloading and maintaining Quality of Service.
A Load Shedding plugin could also save costs by reducing the buffer infra provisioned to handle spiky traffic. If we have a load shedding plugin that guarantees Kong does not go down while also maintaining QoS, then we could reduce the amount of (safe) over-provisioning to handle any sudden increase in traffic.
We went through the following resources to get started-
https://vikas-kumar.medium.com/handling-overload-with-concurrency-control-and-load-shedding-part-1-1a7f76d2a1dd
https://tech.olx.com/load-shedding-with-nginx-using-adaptive-concurrency-control-part-1-e59c7da6a6df
https://netflixtechblog.com/keeping-netflix-reliable-using-prioritized-load-shedding-6cc827b02f94
https://eng.uber.com/qalm-qos-load-management-framework/
https://www.youtube.com/watch?v=XNEIkivvaV4
We thought shedding load on the basis of In-flight Requests would be a good start, so we created a custom plugin and tested it out. We found that setting a rigid limit for IFR does not work as IFR depends on latency as well. So using an IFR limit obtained from a load having 5ms average latency, would not work for a load having 20ms average latency. In the latter case we would shed load even if the system is not overloaded (low CPU usage).
So we are now in the process of including the CPU usage for deciding whether to shed load or not. Using CPU usage along with IFR would ensure we are only shedding load when Kong is actually overloaded by countering the latency issue described above. This approach however is not final and could change as we gain more understanding of which approach works best for this use case.
We would love to know your thoughts on this. Please share with us if you have faced similar problems while using Kong. Feel free to mention any other similar use cases as well. We look forward to our discussions and possible collaborations on this Plugin.