Kong data plane workloads with service mesh enabled via a sidecar, the dataplane application face connectivity issues with sending traces to the Grafana agent

After onboarding the Kong data plane workloads with service mesh enabled via a sidecar, the Kong dataplane application face issues with sending traces to the Grafana agent.

Zipkin plugin is sending the traces to the host grafana agents.

Error

failed the initial dns/balancer resolve for mesh service ‘grafana-agent-traces.svc.cluster.local’ with: failed to receive the reply length field from TCP server 100.1.0.9:53: timeout.

Error Logs in Kong dataplanes:

zipkin request failed: [cosocket] DNS resolution failed: failed to receive the reply length field from TCP server 100.1.0.9:53: timeout. Tried: [“(short)grafana-agent-traces.svc:(na) - cache-miss”

On giving the grafana host the kubernetes dns on which kong dataplane is running is not able to connect to the resolved ip and times out . But when the ip of the grafana service is given it works fine, the connection to grafana agents works.

This only happens when service-mesh sidecar is enabled and it’s accessing through mesh service, on disabling the mesh sidecar , dataplanes are able to resolve and connect to the grafana host.

  • The IP 100.1.0.9 is the Kubernetes default DNS nameserver, automatically assigned to manage DNS resolution within the cluster.
  • DNS resolution works flawlessly when tested manually- by validating DNS Connectivity Check from the Kong dataplane container: it was able to resolve and connect using wget/curl** confirming that the issue is not with the DNS infrastructure itself but likely with how Kong handles DNS queries during runtime.

Any idea what could be the cause here and fix for it?