We are doing some testing of the Kong HTTP log plugin, found out a Kong read and connection timeouts does not invoke the HTTP log plugin in the process for any valuable information. Is there a reason for that or a way to make it so the Http log plugin could send out relevant info?
I’d like to be able to see what specific error occurred too if possible.
I am working alongside Ross, and just wanted to add to this a tad with some screenshots. Proxy definition to enforce a read timeout(“upstream_read_timeout”) we set this property to 1 millisecond to force a failure:
And the Kong response behavior to the Proxy Consumer:
And so far transactions using the http-log plugin show none of this ever occurring. Now the proxy error logs do currently print to std-out, but would you agree that tcp/udp/http-log and all the rest are within the scope of documenting these sorts of errors as well? And if not, then could you guide us as to how we might branch the http-log plugin so it can log these transactions as well? Note that we are indeed also running the Kong 0.12.1 version that has this sort of short circuiting fix in place for transactions that were rejected with a 401 unauthorized (and can confirm indeed the plugins do receive logs in those cases for that fix!).
Indeed, a timeout from the ngx_http_proxy_module will make NGINX produce an HTTP 504 response, which the error_page directive in the Kong NGINX configuration will catch and redirect to our upstream error handling internal NGINX location. This location does not run anything in the NGX_HTTP_LOG_PHASE.
Thanks man! Is this something you think should be added into baseline Kong or do we need to deviate from the norm here and bite the bullet and make a custom template?
Just to explain from our perspective:
We get in a meeting where consumer is having an error, support reviews http-logs we fwd to splunk trying to find the transaction in question. Since Kong shortcuts on these cases a tech would find no trace of these transactions in splunk. Timeouts can be a pretty common occurrence when backend APIs rely on systems that get extremely sluggish at random times of stress .
To add to Ross point above I thought it worthwhile to give some insight into our Dockerfile too as that is how we make the modifications to the lua template files kong includes rather than writing our own(for the time being):
FROM docker.company.com/company_et/alpine:3.6
ENV KONG_VERSION 0.12.1
ENV KONG_SHA256 9f699e20e7d3aa6906b14d6b52cae9996995d595d646f9b10ce09c61d91a4257
USER root
RUN apk add --no-cache --virtual .build-deps wget tar ca-certificates \
&& apk add --no-cache libgcc openssl pcre perl unzip tzdata luarocks git \
&& wget -O kong.tar.gz "https://bintray.com/kong/kong-community-edition-alpine-tar/download_file?file_path=kong-community-edition-$KONG_VERSION.apk.tar.gz" \
&& echo "$KONG_SHA256 *kong.tar.gz" | sha256sum -c - \
&& tar -xzf kong.tar.gz -C /tmp \
&& rm -f kong.tar.gz \
&& cp -R /tmp/usr / \
&& rm -rf /tmp/usr \
&& cp -R /tmp/etc / \
&& rm -rf /tmp/etc \
&& apk del .build-deps
RUN mkdir /usr/local/kong
COPY docker-entrypoint.sh /docker-entrypoint.sh
#Rewrite nginx kong conf to only enable TLS1.2
RUN sed -i '/ssl_protocols TLSv1.1 TLSv1.2;/c\ ssl_protocols TLSv1.2;' /usr/local/share/lua/5.1/kong/templates/nginx_kong.lua
#Temporary fix attempt to log timeouts
RUN sed -ie '/kong.handle_error()/{n;d}' /usr/local/share/lua/5.1/kong/templates/nginx_kong.lua; sed -i '/kong.handle_error()/ a \\t}\n\n\tlog_by_lua_block {\n\t\tkong.log()\n\t}' /usr/local/share/lua/5.1/kong/templates/nginx_kong.lua
#Increase the buffer header defaults
RUN sed -i '68i client_header_buffer_size 8k;' /usr/local/share/lua/5.1/kong/templates/nginx_kong.lua
RUN sed -i '69i large_client_header_buffers 100 32k;' /usr/local/share/lua/5.1/kong/templates/nginx_kong.lua
#Rewrite ngix conf so we can see our Kong ENV variable
RUN sed -i '8ienv KONG_SSL_CERT_KEY;' /usr/local/share/lua/5.1/kong/templates/nginx.lua
#Install custom plugins with luarocks
RUN luarocks install kong-plugin-stdout-log
RUN luarocks install kong-upstream-jwt
#Install HTTP plugin
RUN git clone https://github.optum.com/DataExternalization/optum-kong-http-log-plugin.git /usr/local/share/lua/5.1/kong/plugins/optum-kong-http-log-plugin
ENTRYPOINT ["/docker-entrypoint.sh"]
EXPOSE 8000 8443 8001 8444
STOPSIGNAL SIGTERM
USER 1001
CMD ["/usr/local/openresty/nginx/sbin/nginx", "-c", "/usr/local/kong/nginx.conf", "-p", "/usr/local/kong/"]
I don’t really have the knowledge outside of developing on individual plugins how to debug further, I wonder if kong.handle_errors() short circuits method calls/blocks that come after it, nor do I know where to look to review the source code of kong.handle_errors() and how content_by_lua_block/log_by_lua_block works with these methods(are they sort of like lua “phases” for when executions should happen?
I may have spoken too fast: I tested on my side that the NGINX log phase was executed, but the kong.log() call will not execute plugins indeed, for another reason: NGINX’s error_page does an internal redirect, which causes ngx.ctx (the value storing the API and plugins to execute for this request) to be reset. Thus, no plugins will be run (Kong even lost the information relating to which API was matched in that particular request at this point).
There seems to be no easy fix for this, we’ll have to add this to our backlog and think of a workaround. Sorry for the false hope!
Thanks for your continued effort looking into this, we too are going to try to find a hack/workaround to make Kong log these scenarios via plugin so we have a full view into api transaction errors from external logging applications. If anyone at Kong starts to think up an idea let us know and we will test it!
Any updates on this front? Would it be feasible to copy the entire ngx.ctx to a global var of sorts prior to the error_page internal redirect and still execute kong.log() against that post redirect? Hoping Kong can reach full potential here on transaction logging.
What do you think about this possibly yo : https://github.com/openresty/lua-nginx-module/issues/1057 , you think this to be a valid way to do it in Kong and if we did where should this live? Could this be done in the template? Seems agentzh has known about this for awhile:
“The problem is that the nginx core clears all modules’ ctx data upon internal redirects.”
Just going to update this post so we can consider the initial problem brought up as closed:
Awesome to see @thibaultcha cook something up that will give Kong much better visibility into API transaction error logging!
Will be interesting to see if there are any performance hits here but I am going to be hopeful it does not cost much! I would pay a few milliseconds regardless to ensure better logging.
Both are quite small changes to hot code paths. And there is a third option that is a little bit of both. All solutions have good and bad sides, we need to weight them a bit.
By removing error_page, that rids yourself of the internal redirect in the first place right? But error_page is also how you explicitly state error code to direct to the kong_error_handler