HTTP-Log Plugin Error

jeremyjpj0916 · February 1, 2018, 5:29pm

Kong/kong/blob/master/kong/plugins/http-log/handler.lua

local basic_serializer = require "kong.plugins.log-serializers.basic"
local BasePlugin = require "kong.plugins.base_plugin"
local LuaProducer = require "kong.plugins.log-buffering.lua_producer"
local JSONProducer = require "kong.plugins.log-buffering.json_producer"
local Sender = require "kong.plugins.http-log.sender"
local Buffer = require "kong.plugins.log-buffering.buffer"
local cjson = require "cjson"


local cjson_encode = cjson.encode
local ERR = ngx.ERR


local HttpLogHandler = BasePlugin:extend()


HttpLogHandler.PRIORITY = 12
HttpLogHandler.VERSION = "1.0.0"

This file has been truncated. show original

  ok, err = sock:setkeepalive(conf.keepalive)
  if not ok then
    ngx.log(ngx.ERR, name .. "failed to keepalive to " .. host .. ":" .. tostring(port) .. ": ", err)
    return
end

I see this error in our logs:

2018/02/01 03:06:32 [error] 35#0: *706228 [lua] handler.lua:103: [http-log] failed to keepalive to splunk.company.com:443: connection in dubious state, context: ngx.timer, client: 10.1.51.1, server: 0.0.0.0:8443

I saw the Openresty dev mention a similar error here:

Seems to not happen all the time, can any Kong developers give some insight to this? Is there a way to fix this either on Kong side or Splunk side and is it a dire issue or does it just mean Nginx had to restart a new tcp connection to splunk and re-initiate the ssl handshake for a given transaction?

Edit - I think when this is occurring kong->splunk actually losing transactions .

Edit2 - Commenting out keep alive code block entirely seemed to cause no loss of logs fyi.

Thanks,
-Jeremy

jeremyjpj0916 · February 6, 2018, 5:28am

Well we ended up keeping the keepalive portion commented out, no more issues but I do imagine our overhead with splunk as the logging endpoint is probably a bit heavier now.

jeremyjpj0916 · April 12, 2018, 6:08am

Interestingly enough at a decent TPS of 1k+ this issue seems to still rear its head, the keepalive portion comment I mentioned above seems to have helped to a degree but I still see results like this:

2018/04/12 02:39:49 [error] 56#0: *10324471 peer closed connection in SSL handshake (104: Connection reset by peer), context: ngx.timer, client: 10.129.2.1, server: 0.0.0.0:8443
2018/04/12 02:39:49 [error] 56#0: *10324471 [lua] handler.lua:85: [http-log] failed to do SSL handshake with splunk.company.com:443: handshake failed, context: ngx.timer, client: 10.129.2.1, server: 0.0.0.0:8443
2018/04/12 02:39:49 [error] 56#0: *10324471 [lua] handler.lua:91: [http-log] failed to send data to splunk.company.com:443: closed, context: ngx.timer, client: 10.129.2.1, server: 0.0.0.0:8443
2018/04/12 02:39:49 [error] 55#0: *10324478 peer closed connection in SSL handshake (104: Connection reset by peer), context: ngx.timer, client: 10.129.4.1, server: 0.0.0.0:8443
2018/04/12 02:39:49 [error] 55#0: *10324478 [lua] handler.lua:85: [http-log] failed to do SSL handshake with splunk.company.com:443: handshake failed, context: ngx.timer, client: 10.129.4.1, server: 0.0.0.0:8443
2018/04/12 02:39:49 [error] 55#0: *10324478 [lua] handler.lua:91: [http-log] failed to send data to splunk.company.com:443: closed, context: ngx.timer, client: 10.129.4.1, server: 0.0.0.0:8443
2018/04/12 02:39:49 [error] 54#0: *10324489 peer closed connection in SSL handshake (104: Connection reset by peer), context: ngx.timer, client: 10.129.2.1, server: 0.0.0.0:8443
2018/04/12 02:39:49 [error] 54#0: *10324489 [lua] handler.lua:85: [http-log] failed to do SSL handshake with splunk.company.com:443: handshake failed, context: ngx.timer, client: 10.129.2.1, server: 0.0.0.0:8443
2018/04/12 02:39:49 [error] 54#0: *10324489 [lua] handler.lua:91: [http-log] failed to send data to splunk.company.com:443: closed, context: ngx.timer, client: 10.129.2.1, server: 0.0.0.0:8443
2018/04/12 02:39:49 [error] 57#0: *10324485 peer closed connection in SSL handshake (104: Connection reset by peer), context: ngx.timer, client: 10.131.2.1, server: 0.0.0.0:8443
2018/04/12 02:39:49 [error] 57#0: *10324485 [lua] handler.lua:85: [http-log] failed to do SSL handshake with splunk.company.com:443: handshake failed, context: ngx.timer, client: 10.131.2.1, server: 0.0.0.0:8443
2018/04/12 02:39:49 [error] 57#0: *10324485 [lua] handler.lua:91: [http-log] failed to send data to splunk.company.com:443: closed, context: ngx.timer, client: 10.131.2.1, server: 0.0.0.0:8443

Current code:

-- Log to a Http end point.
-- This basically is structured as a timer callback.
-- @param `premature` see openresty ngx.timer.at function
-- @param `conf` plugin configuration table, holds http endpoint details
-- @param `body` raw http body to be logged
-- @param `name` the plugin name (used for logging purposes in case of errors etc.)
local function log(premature, conf, body, name)
  if premature then
    return
  end
  name = "[" .. name .. "] "
  local ok, err
  local parsed_url = parse_url(conf.http_endpoint)
  local host = parsed_url.host
  local port = tonumber(parsed_url.port)
  local sock = ngx.socket.tcp()
  sock:settimeout(conf.timeout)

  ok, err = sock:connect(host, port)
  if not ok then
    ngx.log(ngx.ERR, name .. "failed to connect to " .. host .. ":" .. tostring(port) .. ": ", err)
    return
  end

  if parsed_url.scheme == HTTPS then
    local _, err = sock:sslhandshake(true, host, false)
    if err then
      ngx.log(ngx.ERR, name .. "failed to do SSL handshake with " .. host .. ":" .. tostring(port) .. ": ", err)
    end
  end

  ok, err = sock:send(generate_post_payload(conf.method, conf.content_type, parsed_url, body, conf.splunk_access_token))
  if not ok then
    ngx.log(ngx.ERR, name .. "failed to send data to " .. host .. ":" .. tostring(port) .. ": ", err)
  end

--  ok, err = sock:setkeepalive(conf.keepalive)
--  if not ok then
--    ngx.log(ngx.ERR, name .. "failed to keepalive to " .. host .. ":" .. tostring(port)  ..":" , err)
    --return 
--  end
end

Any experts of the https://github.com/openresty/lua-nginx-module#ngxsockettcp have an idea of what is going on and a remedy? Would love to be able to achieve no drops in transactions.

My KONG_LUA_SOCKET_POOL_SIZE is set to 30, is this a decent range for an 8 worker process node handling 1-5k tps? Something has to give here.

Would also like to indicate when I see these errors its not 100% of the transaction logging failing but just a large chunk. Interestingly enough I run statsD plugin as well to an app sidecar to kong that then streams metrics into grafana and it does report proper TPS. When it was logging around 5k TPS(true tps) Splunk was getting about 3k I want to say. So 40% Tx loss probably at high TPS using HTTPS.

Topic		Replies	Views
Kong Websockets Logger Plugin Questions	2	1936	May 1, 2019
Kong HTTP Log Plugin Errors Questions	15	6109	June 4, 2018
Custom plugin use lua-resty-http Questions	1	1078	June 10, 2020
Http-log plugin missing part of requests Questions	3	524	August 28, 2018
Enhanced HTTP logging idea	7	2344	December 4, 2018

HTTP-Log Plugin Error

Related Topics