@Abhay_Prajapati I wanted to provide you with some information on a knowledge base tool we are testing internally. Please let me know if this type of answer is helpful to you or not.
Thanks-
Rick
Based on the provided knowledge sources, there is no explicit documentation of a known bug or race condition in Kong Ingress Controller (KIC) 3.2.6 that exactly matches the scenario you described—where the leader KIC pod appears alive (via lease) but fails to distribute configuration updates after a partial node failure, resulting in gateway pods reporting “no configuration found.”
However, there are several related behaviors and limitations documented that may help explain or troubleshoot your issue:
KIC High Availability and Leader Election
KIC uses leader election to ensure only one controller pushes configuration to Kong Gateway. If the leader pod is unresponsive but still holds the lease (e.g., due to a network partition or resource starvation), it may prevent other KIC pods from taking over, potentially causing a stall in config updates. The documentation notes that if KIC isn’t running, new Kong Gateway instances won’t receive configuration, and existing ones will continue to process traffic with their last known config. However, it does not detail a scenario where the leader is “alive” only in terms of lease but not actually functional in config distribution KIC high availability.
Last Known Good Config
KIC stores the last valid configuration in memory and uses it to configure new pods. If all proxy pods are lost and the controller restarts, the last known good config is lost, and new gateway pods may fail readiness checks with “no configuration found.” This could explain why your gateway pods failed after the node reboot if no valid config was available to restore Last Known Good Config.
Failure Modes and Fallback Configuration
KIC 3.2.0+ includes a fallback configuration feature to help isolate and recover from partial config failures. If enabled, it can help keep unaffected parts of the config operational. However, this requires explicit configuration and may not address issues where the leader is unresponsive but still holds the lease Fallback configuration.
Controller Resource Starvation
If the leader KIC pod’s CPU and network dropped to zero, it may have been starved of resources, causing it to stop processing but still hold the lease. The documentation recommends allocating sufficient resources to KIC and the control plane to avoid such issues, especially during node failures or restarts Services and Routes are deleted from Kong Gateway during pod maintenance.
Summary:
While there is no direct mention of a bug or race condition matching your exact scenario, the described behaviors are consistent with known limitations around leader election, resource starvation, and loss of last known good configuration. Ensuring sufficient resources for KIC pods, monitoring leader health beyond just lease status, and enabling fallback configuration (if not already) are recommended mitigations. If you continue to experience this issue, it may warrant a support ticket with Kong for deeper investigation.