Note we are running 0.14.1, we are working on reproducing examples of this.
Edit - Don’t think keep-alive could cause it
Edit Edit - Actually leaning towards it now. Disabling keepalive alliviated problems here:
upstream kong_upstream {
server 0.0.0.1;
balancer_by_lua_block {
Kong.balancer()
}
# keepalive ${{UPSTREAM_KEEPALIVE}};
}
You have
Kong -> (OpenShift Origin Router Endpoint which has round robin to HA proxies)
Then
(OpenShift Origin Router Endpoint which has round robin to HA proxies) -> Service A
(OpenShift Origin Router Endpoint which has round robin to HA proxies) -> Service B
Somehow Kong seems to be intermittently proxying transactions meant for Service A to Service B which should be driven by host headers to the routers presumably. Keep-Alive only keeps a TCP connection alive between Kong and the OpenShift Router Endpoint… Real head scratcher here what could be happening. Still trying to debug.
To put legs on it this is what the clashing services look like:
Service One:
{
"host": "some-service-one.origin-datacenter-core.company.com",
"created_at": 1536789835,
"connect_timeout": 2000,
"id": "9e38db4b-b314-4ff2-b0c9-0f6cbc163332",
"protocol": "https",
"name": "some-service-one",
"read_timeout": 9000,
"port": 443,
"path": "/v1",
"updated_at": 1536789835,
"retries": 0,
"write_timeout": 9000,
"extras": {
"createdUser": null,
"updatedUser": null,
"id": 11,
"service_id": "9e38db4b-b314-4ff2-b0c9-0f6cbc163332",
"kong_node_id": "1",
"createdAt": "2018-09-12T22:03:55.000Z",
"updatedAt": "2018-09-12T22:03:55.000Z"
}
}
Service two:
{
"host": "some-service-two.origin-datacenter-core.company.com",
"created_at": 1534369827,
"connect_timeout": 2000,
"id": "bb76798f-1bda-4e24-a274-4e19e40b881f",
"protocol": "https",
"name": "some-service-two",
"read_timeout": 9000,
"port": 443,
"path": "/",
"updated_at": 1536956373,
"retries": 0,
"write_timeout": 9000,
"extras": {
"createdUser": null,
"updatedUser": null,
"id": 3,
"service_id": "bb76798f-1bda-4e24-a274-4e19e40b881f",
"kong_node_id": "1",
"createdAt": "2018-08-15T21:50:27.000Z",
"updatedAt": "2018-09-14T20:19:33.000Z"
}
}
Since they both resolve to same IP (the router Kong connects to) internally maybe something glitches, will try to figure it out come Monday now it seems.
So Kong Connections that go to a cloud platforms HTTPS Edge configured Router seem to specifically be the culprit due to keepalive directive, where somehow the keepalive sessions get mixed up for proxying.