A few words on regex URIs usage and common pitfalls

After recently discussing about regex URI usage and gotchas with some of our users and customers, it was brought to our attention that our documentation on the subject was rather sparse… Until we improve it, my coworker @Cooper suggested that a wrap-up about regex URIs be made public here, so that you can make the best usage of this awesome feature!

So here we go: a few words on recommended practices, pitfalls and upcoming improvements to regex URIs.

Refresher

The Kong API entity (configurable via the /apis endpoint of the Admin API) has attributes defining how to match a client request to an upstream_url. Those attributes are hosts, methods, and uris.

The documentation on https://getkong.org describes these attributes in the Admin API Reference[1] and their evaluation order in the Proxy Reference[2].

Long story short: the uris attribute can contain plain URL paths (e.g. /users), and regexes (e.g. /users/\d+/profile). When Kong detects a regex (we will see how below), it will evaluate it with the PCRE[3] engine. PCRE is a Kong dependency and is shipped with the Kong distribution packages.

Regex URI validation rules

How does Kong recognizes a given “URI” is a regex?

  1. Some validation rules apply to all URIs, whether plain or regexes (e.g. must start with /)
  2. All URIs are considered as “plain” (prefix matching) until one of the non RFC 3986[4] characters is found (not alphanumeric and not part of the [ .-_~/% ] set).
  3. When the URI has characters outside of the above character set, Kong considers it as a regex URI
  4. Kong runs a “dummy” execution from the PCRE engine against an empty subject "" to make sure that regex is valid (we do not re-implement our own PCRE parser for obvious reasons)
  5. Upon any error from the dummy execution, Kong returns a validation error stating: “PCRE returned …“
  6. If all of the above succeeded, Kong performs subsequent checks (see reference code)
  7. If the URI has a trailing slash (whether it is pain or regex), it gets stripped out

Regex matching

Evaluation order of regexes is covered in the Proxy Reference[2] and below sections, but it is worth to remind us of an important topic: which PCRE flags Kong executes URI regexes with.

When matching regex URIs, Kong makes use of the ngx.re.match()[5] utility from ngx_lua. It is executed with the following flags:

  • a: anchored mode (only match from the beginning)
  • j: enable PCRE JIT compilation
  • o: compile-once mode (similar to Perl’s /o modifier)

Be especially ware of the a flag and thus, avoid using the unnecessary special caret character ^.

Regex URI pitfalls

Creating APIs with regexes
  1. URL-encode your regexes. We keep receiving reports from people sending non URL-encoded regexes (e.g. via curl -d instead of curl --data-urlencode) and complaining that Kong won’t accept it. It sure won’t, since Kong will URL-decode form-urlencoded payloads, and potentially corrupt your regexes. The Proxy Reference[2] has a section dedicated to this topic
  2. In form-urlencoded payloads, be wary of using a comma in your regexes. You must escape commas like so: /users/\d{1\,3}/profile. This is because Kong’s format for Arrays in form-urlencoded expects comma-separated strings. Hence, a comma for the Kong Admin API means a separator between two values (you can see how that’s problematic for a regex…)
Regex shadowing

Regexes can shadow each other. Kong sorts plain URIs by length because that guarantees the longest URI get a chance to be evaluated for a match first, but that is not possible for regexes.

Two regexes of different lengths can match the same subject. Hence, Kong follows (for now) the NGINX idiom: in NGINX, location blocks with regexes are evaluated in the order that they are defined.

Well, in Kong, URI regexes are evaluated in the order that they are defined too. This means that considering 2 APIs with different created_at values and different regexes that would both match the same subject (e.g. /request), the API created first would be evaluated… first. Effectively, the second API is useless and its regex URI will never be evaluated. To change the ordering with this model, APIs must be re-created in the desired order…

To address this concern, we have an incoming update to our data model in which we introduce a field to sort the evaluation order of regexes after their creation. This is not released yet as of today (February 7th, 2018) but is scheduled for our upcoming 0.13.0 release (see the Changelog[6] for target dates).

Using regex URIs in plugins

In plugins, it is possible to retrieve the regex that matched the current request (and plugin execution) thanks to the ngx.ctx.router_matches variable. This variable contains a table with the following structure:

ngx.ctx.matches = {
    uri_captures = {
        [1] = "123",
        user_id = "123"
    },
    uri = "/users/(?P<user_id>\d+)/profile",
    host = "example.com",
    method = "GET"
}

The above example assumes an API definition similar to:

{
    "name": "example",
    "hosts": ["example.com"],
    "uris": ["/profile", "/foo", "/users/(?P<user_id>\d+)/profile"],
    "methods": ["GET"]
}

And the following request:

GET /users/123/profile HTTP/1.1 
Host: example.com

By using this table, plugin developers can detect which conditions triggered a match of the current request to a particular API.

The readers familiar with PCRE will appreciate the support for capturing groups (named or not) by observing the captured user_id value. :+1:

Readers familiar with Lua will appreciate that such captured groups are provided in a table containing both array and map parts, allowing one to index a captured group by name or index (or even iterate over the list of captured groups). :smiley:

References

As of today (0.12.1 currently being the most recent version), the code related to topics discussed today can be found in the following places (yay for Open Source):

Conclusion

Regex support in matching rules has come a long way since the early days of Kong, but there is still much to do with regards to usability and documentation. Both are currently being addressed behind the scenes by the Kong team, and we hope to make their usage simpler in the future. Kong being an open source project, you are also most welcome to contribute improvements to regex URIs! You can start contributing by reading the CONTRIBUTING.md file :slight_smile:

7 Likes