Hi Kong community,
We’ve published a Kong Gateway plugin for routing OpenAI-compatible chat completion traffic to one or more ollama-agent-router runtime nodes.
The plugin is available on LuaRocks: kong-plugin-ollama-agent-router
You can check the setup flow, configuration examples, and security details in the README here: kong-ollama-agent-router
What it provides:
-
routes OpenAI-compatible
/v1/chat/completionstraffic through Kong to one or moreollama-agent-routernodes -
keeps routing decisions, model selection, runtime state, node capabilities, and async job handling behind the Kong Gateway layer
-
supports static node-router discovery with weighted nodes
-
can prefer already loaded models, respect node weights, and fail over on execution errors
-
fetches model specs, runtime capabilities, GPU/VRAM state, queues, and job policies from each node-router instead of duplicating them in Kong config
-
supports secured runtime-agent communication using bearer tokens, custom headers, per-node credentials, TLS verification, and optional client certificates
-
preserves an OpenAI-compatible gateway surface for clients while delegating local machine state and execution details to the runtime nodes
It’s free to use and MIT licenced. Enjoy and all the feedback is welcomed