Remove Specific Storage Dependencies


#1

Why isn’t the datastore completely decoupled such that users can leverage their existing storage solutions? Kong does not seem to justify anything more than a shared file across the cluster, let alone a full-blown enterprise RDBMS or high-performance distributed store such as Cassandra? That seems to defy basic separation of concerns. Polling engineers within the organization, and everybody had precisely the same complaint. What if we want to use Redis for example, which is arguably several of orders more prevalent than either of the choices available to Kong users.


#2

Hi! Thanks for bringing up those concerns. This is definitely something we also discuss internally. If you look at the Kong codebase, the datastore is essentially decoupled (after all, it’s what allows us to support both Postgres and Cassandra, which are indeed very different beasts). Adding support for a third one (or even to none!) is definitely possible, but besides the work to add a new DAO backend for a different persistence solution, one must take into consideration the long-term maintenance of these backends. Having more entries in a support matrix adds a lot more in support overhead than the initial implementation effort alone.


#3

It has good and bad sides to have any external database dependency. One of the biggest good sides is that database is actually what forms a Kong Cluster. You can keep adding nodes as long as they connect to same database they will be part of the same cluster. Database is also used to synchronize Kong cache across the cluster without Kong nodes knowing anything about different Kong nodes. We could add more DAO adapters, such as Redis. But we think PostgreSQL and Cassandra are quite good in their own. Cassandra has a built-in synchronization between multiple Cassandra nodes across geographical locations. And PostgreSQL is tried-and-true and more traditional SQL database with a lot of tooling.

So I think it separates the concerns quite nicely.

  1. there is database (central or distributed)
  2. there are kong nodes

And we need to remember that many Kong plugins store data in databases too, and some plugins use Redis (as an option). Normally Kong plugins cache (at start or when warmed up) entities in database so that in many situations, Kong doesn’t make any database calls (I would say it is an exception when it does, and Kong’s Admin API is an obvious example of such exception here -> we need to write the changes somewhere).

Shared file across a cluster seems pretty hard to do so that it performs great. Of course that works if that file is static (doesn’t change when Kong is started), but Kong is really a dynamically configurable product. You can add entities into database (by calling Kong Admin API endpoints) without restarting Kong, and the whole cluster gets notified about it automatically (and fast without any other process notifying them separately). And all this happens in a non-blocking fashion. There is hardly no such thing as non-blocking file io — I do know there are something, but they are not really a solutions to this problem. And then you need to worry about concurrency with single file in a cluster (all that locking etc.). And that leads to performance issues. So we end up writing a database server. Redis is great. It started as a single threaded server, but it seems they are adding more threads (v. 4.0 adds one for background deletion). Everyone has their favourite one. It is quite hard to support all of them, and we think Cassandra and Postgres are quite good choices. Could we have chosen something else? Absolutely. But as said, we cache aggressively, so it is very much a non-issue to us. And we would cache even with Redis. It is a lot faster to get the data from Nginx worker level cache or Nginx node level shared memory cache than make a network (even a loopback) round-trip to any database. In that sense “all the databases” are equally fast when used in warmed-up Kong cluster.

And as Hisham said, you can setup your database or database clusters however you want. Kong is just a client to those (Cassandra or Postgres), it is not like Kong owns them.

There are a lot we could do, some ideas are:

  1. in-memory database (with a redis style dumps) using e.g. Nginx Shared Memory -> that has no dependencies, but CANNOT be clustered, at least easily (very simple, and could be something we need for better support of Service Mesh architecture -> sidecar deployment of a proxy together with a microservice + a separate control plane).
  2. Add a support for Database Server X (such as Redis)
  3. Use system management tools to make changes across the cluster (e.g. whenever you change configuration you push a new configuration file across the cluster and make the nodes to “reload”), could also form a control plane or we could utilize something existing (with the dependencies of their own)

But separating things can also add complexity. E.g. why we have a separate control-plane, and synchronization technologies and multiple different tools to make this work? Why can’t we just use a database, one might argue.

If you write plugins for Kong, you can make different choices than what Kong at its core has made, such as using different database servers.