Routing Issues When Sending to Multiple Kubernetes pods


#1

Hi Kong team,

Today we experienced an issue, not sure if it’s related to Kong or not - but I wanted your take.

Our Kong has several separate services that route to a RedHat kubernetes re-skin called Openshift Origin (OSO).

The way OSO works, is that routes are defined for the separate services hosted on it - that all resolve to the same IP (OSO Router). The way the router determines which pod to send traffic to, is via the HTTP HOST Header.

So:
curl https://myapp.oso.company.com - routes to myapp pod
curl https://theirapp.oso.company.com - routes to theirapp pod

But
nslookup myapp.oso.company.com = 10.0.0.1
nslookup theirapp.oso.company.com = 10.0.0.1

so if I wanted to use the IP to make the call, I’d do this:

curl https://10.0.0.1/ -H "Host: myapp.oso.company.com"
curl https://10.0.0.1/ -H "Host: theirapp.oso.company.com"

What we noticed, is that when both the proxy for myapp and theirapp have tps > 5, sometimes requests to the myapp proxy appear to route to the theirapp proxy. Could you shed any light on why this might be? If it is a kong issue, this is a serious problem.

Edit: could this possibly have something to do with keepalive?


#2

Note we are running 0.14.1, we are working on reproducing examples of this.

Edit - Don’t think keep-alive could cause it
Edit Edit - Actually leaning towards it now. Disabling keepalive alliviated problems here:

  upstream kong_upstream {
      server 0.0.0.1;
      balancer_by_lua_block {
          Kong.balancer()
      }
     # keepalive ${{UPSTREAM_KEEPALIVE}};
}

You have

Kong -> (OpenShift Origin Router Endpoint which has round robin to HA proxies)

Then
(OpenShift Origin Router Endpoint which has round robin to HA proxies) -> Service A
(OpenShift Origin Router Endpoint which has round robin to HA proxies) -> Service B

Somehow Kong seems to be intermittently proxying transactions meant for Service A to Service B which should be driven by host headers to the routers presumably. Keep-Alive only keeps a TCP connection alive between Kong and the OpenShift Router Endpoint… Real head scratcher here what could be happening. Still trying to debug.

To put legs on it this is what the clashing services look like:
Service One:

{
  "host": "some-service-one.origin-datacenter-core.company.com",
  "created_at": 1536789835,
  "connect_timeout": 2000,
  "id": "9e38db4b-b314-4ff2-b0c9-0f6cbc163332",
  "protocol": "https",
  "name": "some-service-one",
  "read_timeout": 9000,
  "port": 443,
  "path": "/v1",
  "updated_at": 1536789835,
  "retries": 0,
  "write_timeout": 9000,
  "extras": {
    "createdUser": null,
    "updatedUser": null,
    "id": 11,
    "service_id": "9e38db4b-b314-4ff2-b0c9-0f6cbc163332",
    "kong_node_id": "1",
    "createdAt": "2018-09-12T22:03:55.000Z",
    "updatedAt": "2018-09-12T22:03:55.000Z"
  }
}

Service two:

{
  "host": "some-service-two.origin-datacenter-core.company.com",
  "created_at": 1534369827,
  "connect_timeout": 2000,
  "id": "bb76798f-1bda-4e24-a274-4e19e40b881f",
  "protocol": "https",
  "name": "some-service-two",
  "read_timeout": 9000,
  "port": 443,
  "path": "/",
  "updated_at": 1536956373,
  "retries": 0,
  "write_timeout": 9000,
  "extras": {
    "createdUser": null,
    "updatedUser": null,
    "id": 3,
    "service_id": "bb76798f-1bda-4e24-a274-4e19e40b881f",
    "kong_node_id": "1",
    "createdAt": "2018-08-15T21:50:27.000Z",
    "updatedAt": "2018-09-14T20:19:33.000Z"
  }
}

Since they both resolve to same IP (the router Kong connects to) internally maybe something glitches, will try to figure it out come Monday now it seems.

So Kong Connections that go to a cloud platforms HTTPS Edge configured Router seem to specifically be the culprit due to keepalive directive, where somehow the keepalive sessions get mixed up for proxying.


#3

UPDATE:

We find that, when we disable ngx upstream keepalive, this issue is resolved.

My thought is this: because we are sending to the same IP:port - but with different host headers, kong is re-using a tcpkeepalive socket for the wrong upstream, since the balancer only sends IP:Port to ngx.

QUESTION: Is there a way for us to have the keepalive session based on hostname:port instead of IP:port? Or to otherwise force ngx to create a new keepalive session based based on the proxy which invoked it?


#4

Ultimately right now in the immediate what we need is a way to define something that lets us toggle upstream keepalive for certain hosts and their scheme in Kongs nginx.conf. So far none of this seems to work and everything I read talks about if statements are bad and that nginx variables can’t be used in the http {} block
(which that block seems to swallow the entirety of the conf just about, yet I see Kong declare them anyways?)

upstream kong_upstream {
    server 0.0.0.1;
    balancer_by_lua_block {
        Kong.balancer()
    }

    set $keepaliveFlag Init;
    if($upstream_scheme = https)
    {
        set $keepaliveFlag "${keepaliveFlag}Https";
    }

   if ($upstream_host ~* "^(.*)(cloudenv1-datacenter1|cloudenv2-datacenter1)(.*)") 
   {
      set $keepaliveFlag "${keepaliveFlag}CloudPlatform"; 
   }
  
   #If upstream is not https OR does not contain a reference to the cloud environment host, then safe to use keepalive
   if ($flagcheck != InitHttpsCloudPlatform)
   {
      keepalive ${{UPSTREAM_KEEPALIVE}};
   }
}

Anyone way better at nginx care to describe an easy way to accomplish the above? I am out of ideas.


#5

Hello,

Thank you for reporting your observations to us. We are aware of some connection pooling issue with Nginx, but related to TLS - another topic, most likely for another time.

I spent a good amount of time trying to reproduce what you are seeing on a plain OpenResty 1.13.6.1 instance, without any success. Is there any chance this could be an issue from the OpenShift router maybe?

I used the following configuration:

worker_processes 1;
error_log logs/error.log;

events {
    multi_accept on;
    worker_connections 4096;
}

http {
    upstream my_upstream {
        #server 127.0.0.1:9000;
        server 0.0.0.1;

        balancer_by_lua_block {
            local ngx_balancer = require "ngx.balancer"
            ngx_balancer.set_current_peer("127.0.0.1", 9000)
        }

        keepalive 4096;
    }

    server {
        server_name my_server;
        listen 8000;

        access_log logs/access.log;
        error_log logs/error.log notice;

	location / {
		proxy_pass         http://my_upstream;
		proxy_http_version 1.1;
		proxy_set_header   Host $http_host;
		proxy_set_header   Connection '';
	}
    }

    server {
        server_name hello.com;
        listen 9000;

        access_log logs/access_upstream.log;
        error_log logs/error_upstream.log debug;

        location / {
            content_by_lua_block {
                ngx.say("Host: ", ngx.var.http_host)
                if ngx.var.http_host ~= "hello.com" then
                    error("invalid host header: " .. ngx.var.http_host)
                end
            }
        }
    }

    server {
        server_name bye.com;
        listen 9000;

        access_log logs/access_upstream.log;
        error_log logs/error_upstream.log debug;

        location / {
            content_by_lua_block {
                ngx.say("Host: ", ngx.var.http_host)
                if ngx.var.http_host ~= "bye.com" then
                    error("invalid host header: " .. ngx.var.http_host)
                end
            }
        }
    }
}

I then used ab to throw a large amount of traffic at it:

$ ab -v -k -c 1000 -t 30 -H 'Host: hello.com' 'http://localhost:8000/'
$ ab -v -k -c 1000 -t 30 -H 'Host: bye.com' 'http://localhost:8000/'

Monitoring TCP traffic between the server and the “upstreams”, error logs for any possible mismatch, and ensuring keep-alive is in effect. Nothing jumped to my eyes.

A next step would be to conduct the same test with Kong itself, but I wanted to check in with you about the OpenShift Router possibility first.


#6

@thibaultcha

Thanks for taking some time and trying to reproduce, across all our clusters we have currently just disabled keepalive to upstreams entirely for now and the problem did in fact disappear completely and could not be reproduced (we were able to reproduce consistently with our own tests to our test apps we exposed via routes on the cloud platform with keepalive enabled). I am still on the fence if its a Kong/Nginx issue or something STILL to do with the other routing layers before it reaches say a backends micro-service pod. Let me clarify the end to end here to paint a better picture:

->Kong(Running on Cloud Platform) -> (F5 LTM That Load balances “single ip/host”) -> (4 HA Proxy Routers that direct traffic to the pod applications based on the exposed routes via host header) -> OpenShift Project Pod w/ the exposed route.

Things we have still not tried: Does the same behavior occur on HTTP exposed OpenShift Routes(Port 80).

We initially thought edge routing may play a role with its TLS termination but then we had an incident with another team being impacted who had passthrough TLS routes so we know its not just isolated to edge routing. We will continue to investigate as we can, I don’t think at current volumes(500k tx a day) disabling keepalive will really hurt anything but in the long run I much prefer that directive intact.

F5 Config info if anything catches your eye on its persistence/profile settings:

f5_80 Profile:

IP Protocol: tcp
SAT: {"type":"snat","pool":"/Common/snatpool_highvolume_cloud_1"}
Persistence: cookie
    Type: 
Profiles:
    http-xforwardfor / Context: all
        Defaults From: http
        X Forward For: enabled
        Rewrite Redirect: none
        Fallback Host:
        Request Header Insert: 
    tcp / Context: all
        Defaults From:
        Idle Timeout: 300
        Keep Alive Interval: 1800
        Receive Window Size: 65535
        Send Buffer Size: 65535
Pool: cloud_80
    Allow NAT: yes / Allow SNAT: yes
    Load Balancing Method: least-connections-member
    Active Pool Members: 4

f5_443 Profile:

IP Protocol: tcp
SAT: {"type":"snat","pool":"/Common/snatpool_highvolume_cloud_1"}
Persistence: Source_30min
    Type: 
Profiles:
    fastL4 / Context: all
        Defaults From:
        Idle Timeout: 300
        Keep Alive Interval: disabled
        TCP Handshake Timeout: 5
        TCP Close Timeout: 5
Pool: cloud_443
    Allow NAT: yes / Allow SNAT: yes
    Load Balancing Method: least-connections-member
    Active Pool Members: 4

Only diff I see between the 80 and 443 on the persistence is that Cookie vs Source_30min.

The HA proxies I do not have access or config info for at this time, might be able to dig some up by talking with our cloud team.

If you have any ideas/theory’s based on what you know now about our architecture that you think could be the culprit feel free to share and I will try to follow through investigating or following up on those hunches, or if you have any debug line ideas we should add to the Kong nginx conf that might help out during runtime can do so and report back.


#7

Thanks for sharing more details on the architecture. Looking at it, this looks less and less like an issue related with Kong at this point. Kong is maintaining a single keepalive connection and sending various HTTP requests throughought the lifetime of that connection, nothing unusual here.
The issue seems to be that a later network hop is opening a connection appropriately based on the first request sent by Kong, but then maintaining that connection opened without expecting that Kong will then send requests for another host through the same connection.
The most efficient way to track this down would probably be to obtain packet captures or run a packet sniffer from in and out of each of these hops.


#8

Had a really good idea, what if I bypass the F5, set the service resource to the IP of one of the HA Proxy routers and pass in a host header to the gateway, will Kong overwrite that host header or will it let that persist transparently on the Proxy call(worried it may not which means ill just have to focus on packet captures)? I think if I setup 2 proxies like that pointing to one of the HA proxy routers fronting the cloud platform and can just pass in that host header and don’t see issue I will be leaning towards something problematic in how the F5 LB is configured.

Edit - ooo looks like I may be able to override the host header by setting preserve_host to true on the route!


#9

This may work, but you will have no guarantee that Kong will reuse an already established connection. I suggest that you monitor TCP traffic going out of Kong and ensure that the requests you are making are going through an already opened connection to the HA proxy routers.

Yes :slight_smile:


#10

Coming back to this a few months later:

This may work, but you will have no guarantee that Kong will reuse an already established connection.

Even with upstream keepalive enabled and pointing directly to an HA proxy(that already supports keepalive)? Slightly confused how that could be the case because upstream keepalive by the docs: “reuse its existing connections per upstream.” Is upstream keepalive a “per worker process”, I sort of assumed it was global to the nginx webserver.

We have had time to think on it some more and review configs around the HA proxies fronting our OpenShift cluster.

First of all most importantly is this setting:
https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4

They had no such option set, so option http-keep-alive becomes the default. They did have a http-keep-alive timeout of 300s set as well.

What does this mean? It means the HA proxy gets a connection from the client and holds that open as well as an active connection the the OpenShift server pod it routes to. Hence when a subsequent call comes from the client with maybe a different host header(the value these HA Proxy routers leverage to know where to route my call), since it has that keepalive and connection maintained with the server, it routes to the incorrect pod over the same active client connection.

Now I did suggest them an alternative option which is option http-server-close , this will maintain active client connections, while not keeping the connection alive to the servers themselves. I also saw it discussed here:

Now while the above is a solution, any private cloud platform is shared infrastructure, so I am not sure they can make that change to solve for my use case. As ideally it seems reasonable for the HA proxy to be allowed to maintain active connections to clients and servers and not JUST the client connections.

So ultimately my question is this. Is it possible in Kong land leveraging lua or some other technique to adjust nginx under the hood to not just do keepalive connections by ip:port. But rather establish a brand new keepalive connection by taking hostname into consideration as well?

Or is this a question that has to be directed at the NGINX team themselves for an implementation of keepalive, maybe a directive, that could support such a flow(new keepalive connections per hostname rather than pure ip:port).

Thoughts? I can’t imagine we are the first to ever leverage Kong fronting a private cloud platform where you have:

Kong -> F5(LB, one IP) -> 4+ HA proxies -> Pods(Applications on Openshift cloud platform)

Maybe a better solution I am not seeing?

Of course I can keep upstream keepalive disabled(how we are doing it now) and ALWAYS do a new connection for every request ever but that seems horridly inefficient.