Seeking additional details around DNS resolution

Hi,

This topic began as a GitHub issue, but probably works better in this forum. After reading about DNS-based load balancing caveats, I’m looking for additional information on the following scenarios:

  1. If the dns_order configuration option contains LAST and DNS resolution for a service temporarily fails, Kong reverts to using the value of the last successful resolution attempt. In my environment, Kong relies on A and SRV records via Consul DNS to provide IP address and port information for a service. If resolution of a SRV record temporarily fails but the A record succeeds, Kong reverts to routing that particular service to the correct IP address but incorrect port, usually 80. If this occurs, use of LAST seems to prevent Kong from attempting DNS resolution for the service again, even after the TTL expires for any DNS records. The only solution for this issue involves restarting Kong. In my opinion, Kong should continue to attempt DNS resolution for a service.

  2. If more than three instances of a service exist, Kong only receives DNS records for three of them. Thus, Kong only adds routes for three of them. After the TTL expires, Kong should perform DNS resolution again which usually returns a different set of values. If these values differ from the existing values, does Kong update routing for the service to use the new values?

  3. If more than three instances of a service exist, the documentation suggests running enough instances of Kong to resolve all of the service instances. For example, six instances of a service requires two instances of Kong to resolve all of the DNS records. However, DNS resolution returns random values, so how does running two instances of Kong guarantee routing to all six instances of a service?

Thanks,
Matt

If resolution of a SRV record temporarily fails but the A record succeeds, Kong reverts to routing that particular service to the correct IP address but incorrect port, usually 80.

this is expected since an A record does not carry port information.

The only solution for this issue involves restarting Kong. In my opinion, Kong should continue to attempt DNS resolution for a service.

No. In this case the DNS server clearly responded with a “3 name error”, meaning “I do not know that name”. That is primarily a DNS server problem. You cannot expect Kong to interpret the results to mean something else.
You can simply remove the LAST option from the 'dns_order` property, to force Kong to bypass this optimization, and revert back to SRV the next time the TTL expires.

  1. If more than three instances of a service exist, Kong only receives DNS records for three of them. Thus, Kong only adds routes for three of them. After the TTL expires, Kong should perform DNS resolution again which usually returns a different set of values. If these values differ from the existing values, does Kong update routing for the service to use the new values?

Yes it does. There is one catch here, if you use the “consistent hashing” setting in the balancer, you might loose some consistency since requests get re-mapped to other nodes.

  1. If more than three instances of a service exist, the documentation suggests running enough instances of Kong to resolve all of the service instances. For example, six instances of a service requires two instances of Kong to resolve all of the DNS records. However, DNS resolution returns random values, so how does running two instances of Kong guarantee routing to all six instances of a service?

No guarantees. Assuming the dns server responds with randomized results of maximum 3 entries. Then given you do enough queries, you will get all instances. But no guarantees. The more independent queries you do the more likely you’ll get a proper distribution of traffic over your backends.
Important to know here is that the Kong dns cache is per worker. So if you have a single Kong node, you still have multiple workers (by default 1 worker per CPU core), each doing their own DNS resolution. So usually this problem is automatically solved.

hth

Some great advice and clarity in here @Tieske - I’ve taken the liberty of attempting to capture it in our docs via https://github.com/Kong/docs.konghq.com/pull/922 - your comments on that PR are encouraged!

Hi Tieske,

Thanks for the clarification. The lua-resty-dns library in OpenResty seems to support initial TCP queries (via tcp_query method) and reversion to TCP if the DNS server returns a truncation flag via UDP query. The Consul DNS server returns a maximum of three values for a particular record via UDP query to prevent returning a truncation flag. Thus, the Kong DNS client never reverts to TCP to retrieve all of the values for a particular record. Adding a configuration option to enable initial TCP queries would resolve this problem. Alternatively, the Consul DNS server (and probably all modern DNS servers) supports EDNS which can return a nearly arbitrary quantity of records and values via UDP query. However, the lua-resty-dns library does not support EDNS.

Matt

Created an issue here: https://github.com/Kong/lua-resty-dns-client/issues/63

@ionosphere80 thx for the suggestion.

Hi, In and around this thread

consul has an option to return the truncate flag the return payload will be greater than 512bytes.
This should be standard compliant behaviour, but consul need the setting flagged to be so.

hashicorp/consul#376