Active health checks cannot set target "Healthy" automatically again after "down" simulation

Hey folks!

I’m using Kong OSS in a Docker container and I facing some issues when I try to activate “Load Balancing” avaible on Upstreams feature.

First of all, my configuration:

SERVICES

{
  "host": "my_simple_api_upstream",
  "id": "f0f273cd-df43-48d2-8374-66848cd16f2c",
  "protocol": "https",
  "read_timeout": 60000,
  "tls_verify_depth": null,
  "port": 8000,
  "updated_at": 1626319456,
  "ca_certificates": null,
  "created_at": 1626318853,
  "connect_timeout": 60000,
  "write_timeout": 60000,
  "name": "my_simple_api",
  "retries": 5,
  "path": "/",
  "tls_verify": null,
  "tags": [],
  "client_certificate": null,
  "extras": {
    "createdUser": null,
    "updatedUser": null,
    "id": 3,
    "service_id": "f0f273cd-df43-48d2-8374-66848cd16f2c",
    "kong_node_id": "1",
    "description": null,
    "tags": null,
    "createdAt": "2021-07-15T03:14:13.000Z",
    "updatedAt": "2021-07-15T03:24:16.000Z"
  }
}

ROUTES

{
  "strip_path": true,
  "tags": null,
  "updated_at": 1626319001,
  "destinations": null,
  "headers": null,
  "protocols": [
    "http",
    "https"
  ],
  "methods": null,
  "service": {
    "host": "my_simple_api_upstream",
    "id": "f0f273cd-df43-48d2-8374-66848cd16f2c",
    "protocol": "https",
    "read_timeout": 60000,
    "tls_verify_depth": null,
    "port": 8000,
    "updated_at": 1626319456,
    "ca_certificates": null,
    "created_at": 1626318853,
    "connect_timeout": 60000,
    "write_timeout": 60000,
    "name": "my_simple_api",
    "retries": 5,
    "path": "/",
    "tls_verify": null,
    "tags": [],
    "client_certificate": null,
    "extras": {
      "createdUser": null,
      "updatedUser": null,
      "id": 3,
      "service_id": "f0f273cd-df43-48d2-8374-66848cd16f2c",
      "kong_node_id": "1",
      "description": null,
      "tags": null,
      "createdAt": "2021-07-15T03:14:13.000Z",
      "updatedAt": "2021-07-15T03:24:16.000Z"
    }
  },
  "snis": null,
  "hosts": null,
  "name": "my_simple_api_route",
  "path_handling": "v1",
  "paths": [
    "/mysimpleapi"
  ],
  "preserve_host": false,
  "regex_priority": 0,
  "response_buffering": true,
  "sources": null,
  "id": "1de27a8e-3aee-4513-866a-f3104db41522",
  "https_redirect_status_code": 426,
  "request_buffering": true,
  "created_at": 1626319001
}

UPSTREAMS

{
  "client_certificate": null,
  "created_at": 1626319334,
  "id": "f1af9630-5fda-42ea-ad9a-10f9497e5c95",
  "tags": null,
  "name": "my_simple_api_upstream",
  "algorithm": "round-robin",
  "hash_on_header": null,
  "hash_fallback_header": null,
  "host_header": null,
  "hash_on_cookie": null,
  "healthchecks": {
    "threshold": 0,
    "active": {
      "unhealthy": {
        "http_statuses": [
          429,
          404,
          500,
          501,
          502,
          503,
          504,
          505
        ],
        "tcp_failures": 0,
        "timeouts": 2,
        "http_failures": 2,
        "interval": 7
      },
      "type": "http",
      "http_path": "/",
      "timeout": 1,
      "healthy": {
        "successes": 5,
        "interval": 5,
        "http_statuses": [
          200,
          302
        ]
      },
      "https_sni": null,
      "https_verify_certificate": false,
      "concurrency": 10
    },
    "passive": {
      "unhealthy": {
        "http_failures": 0,
        "http_statuses": [
          429,
          500,
          503
        ],
        "tcp_failures": 0,
        "timeouts": 0
      },
      "healthy": {
        "http_statuses": [
          200,
          201,
          202,
          203,
          204,
          205,
          206,
          207,
          208,
          226,
          300,
          301,
          302,
          303,
          304,
          305,
          306,
          307,
          308
        ],
        "successes": 0
      },
      "type": "http"
    }
  },
  "hash_on_cookie_path": "/",
  "hash_on": "none",
  "hash_fallback": "none",
  "slots": 1000
}

I’am using the “Active health checks” configuration with two targets running locally on HOST(outside network used by Kong container):

  1. https://localhost:5001
  2. https://localhost:6001

In the Upstream configuration I put the host.docker.internal solved by host file(Windows) so the Kong container can “touch” this targets/services, like:

  1. https://host.docker.internal:5001
  2. https://host.docker.internal:6001

With this initial configuration on Upstream everything works fine, the Kong algorithm can balance beetween this two targets. But, when I try to enable the “Active health checks” I had a unexpected behavior when simulate a “shutdown” of one these targets.

When I activate the health check mechanism Kong set both target as “Healthy”, great. But when I try to simulate a “broken target”(shutdown a localhost api, for example) the Kong “checkers” set this target “Unhealthy” and never set this “Healthy” again.

Kong container logs sample


2021/07/15 04:15:24 [warn] 28#0: *62794 [lua] ring.lua:246: redistributeIndices(): [upstream:a_upstream 1] redistributed indices, size=1000, dropped=0, assigned=0, left unassigned=1000, context: ngx.timer, client: 172.24.0.1, server: 0.0.0.0:8000

2021/07/15 04:15:27 [error] 28#0: *226258 [lua] healthcheck.lua:1096: log(): [healthcheck] (116726d8-af6f-46bb-b779-ce3ed3110c41:my_simple_api_upstream) failed to receive status line from 'host.docker.internal (host.docker.internal:5001)': closed, context: ngx.timer, client: 172.24.0.1, server: 0.0.0.0:8000

2021/07/15 04:15:27 [error] 28#0: *226258 [lua] healthcheck.lua:1096: log(): [healthcheck] (116726d8-af6f-46bb-b779-ce3ed3110c41:my_simple_api_upstream) failed to receive status line from 'host.docker.internal (host.docker.internal:6001)': closed, context: ngx.timer, client: 172.24.0.1, server: 0.0.0.0:8000

2021/07/15 04:15:27 [error] 28#0: *226261 [lua] healthcheck.lua:1096: log(): [healthcheck] (116726d8-af6f-46bb-b779-ce3ed3110c41:my_simple_api_upstream) failed to receive status line from 'host.docker.internal (host.docker.internal:5001)': closed, context: ngx.timer, client: 172.24.0.1, server: 0.0.0.0:8000

2021/07/15 04:15:27 [error] 28#0: *226261 [lua] healthcheck.lua:1096: log(): [healthcheck] (116726d8-af6f-46bb-b779-ce3ed3110c41:my_simple_api_upstream) failed to receive status line from 'host.docker.internal (host.docker.internal:6001)': closed, context: ngx.timer, client: 172.24.0.1, server: 0.0.0.0:8000

If we set the “Healthy” of this targets using the Kong api directly everything works fine, and the balancing can work again:

**Konga Request:**
http://localhost:1337/kong/upstreams/f1af9630-5fda-42ea-ad9a-10f9497e5c95/targets/6efe54ec-084b-4a71-8803-b3e6b4380c4e/healthy

Logs:

2021/07/15 04:22:42 [warn] 23#0: *237542 [lua] healthcheck.lua:1096: log(): [healthcheck] (116726d8-af6f-46bb-b779-ce3ed3110c41:my_simple_api_upstream) healthy forced for host.docker.internal host.docker.internal:6001, client: 172.24.0.2, server: kong_admin, request: "POST /upstreams/f1af9630-5fda-42ea-ad9a-10f9497e5c95/targets/6efe54ec-084b-4a71-8803-b3e6b4380c4e/healthy HTTP/1.1", host: "kong:8001",
172.24.0.2 - - [15/Jul/2021:04:22:42 +0000] "POST /upstreams/f1af9630-5fda-42ea-ad9a-10f9497e5c95/targets/6efe54ec-084b-4a71-8803-b3e6b4380c4e/healthy HTTP/1.1" 204 0 "-" "-",
2021/07/15 04:22:43 [warn] 25#0: *237547 [lua] balancer.lua:313: [healthchecks] failed setting peer status (upstream: 116726d8-af6f-46bb-b779-ce3ed3110c41:my_simple_api_upstream): no peer found by name 'host.docker.internal' and address host.docker.internal:6001, context: ngx.timer,
2021/07/15 04:22:46 [warn] 23#0: *237619 [lua] healthcheck.lua:1096: log(): [healthcheck] (116726d8-af6f-46bb-b779-ce3ed3110c41:my_simple_api_upstream) healthy forced for host.docker.internal host.docker.internal:5001, client: 172.24.0.2, server: kong_admin, request: "POST /upstreams/f1af9630-5fda-42ea-ad9a-10f9497e5c95/targets/18dafd82-fc30-4dde-a28c-9ad5560a0d55/healthy HTTP/1.1", host: "kong:8001",
172.24.0.2 - - [15/Jul/2021:04:22:46 +0000] "POST /upstreams/f1af9630-5fda-42ea-ad9a-10f9497e5c95/targets/18dafd82-fc30-4dde-a28c-9ad5560a0d55/healthy HTTP/1.1" 204 0 "-" "-"

Infra/Config

OS = Windows 10 Pro 21H1, build: 19043.1083
Kong = 2.5.0(tested in 2.3.2-alpine too)[Container]
PostgreSQL = 9.5 [Container]
Docker Engine = v20.10.7

Why I miss? Any network configuration between HOST and Docker container used by Kong? :laughing:

Any help are welcome! :ghost:

Cheeeeeeeeeeeeeeeeeeeeeeeeeers!