Let's Have a War • Slick Connection Pooling[web search]
Slick (formerly ScalaQuery) is a nice scala library for accessing your database, works with many vendors and it’s part of the typesafe stack.
Slick’s documentation is awesome to get you started quickly. Unfortunately there are some important topics missing from the docs or examples which are mandatory for using it in a production environment.
A simple example
Let’s create a data access object and a main
App to exercise it. Following the official docs this is kinda trivial:
Great! A few things to note:
The code is elegant and simple.
There is a minor workaround needed to use auto-increment fields in postgress. More info can be found in this Stack Overflow question. Also, I’ve filed a ticket for it so it’s hopefully going to get fixed soon.
Unfortunately, this code as it is can never go to production.
What’s wrong with that?
This code does not use a database connection pool. If you used other ORM or high-level DAO libraries you probably didn’t have to think about that since the framework/library handled it automatically for you. Slick doesn’t.
So? well this means that for every database session (
withSession blocks) Slick is opening a new connection to the database server.
The cost of creating and closing a connection every time is prohibitive. We need to reuse them.
Let’s put it to work. Check this line:
Database.forURL("jdbc:postgresql://localhost:5432/demo", driver = "org.postgresql.Driver")
We use Slick’s Database object to create a new database instance from a jdbc url. The scaladocs say that’s not the only way of getting a
Database instance, we can also get one from a
c3p0 has a class called
ComboPooledDataSource which works as a pooled and configurable version of
With the pool in place, let’s run some benchmarks to check the performance improvement, for this I’m gonna use this little benchmarking snippet
RESULTS (without connection pool): -------- mean: 29 ms 50% : 28 ms 75% : 30 ms 90% : 33 ms 99% : 39 ms
RESULTS (with connection pool): -------- mean: 12 ms 50% : 12 ms 75% : 14 ms 90% : 15 ms 99% : 16 ms
Indeed we cut our response times by 50%! Neat!
Should we really worry about 10 msecs?
Yes. We should.
The absolute numbers here are deceiving since most of the time saved is network overhead, which is not a lot when both the database server and the application run on the same box. This is common in a development environment like this but not likely in production.
Let’s run these benchmarks against a remote and thus high latency postgress server, using Heroku (note that I live in Argentina so latency is higher than you may experience):
RESULTS (without connection pool): -------- mean: 8069 ms 50% : 7985 ms 75% : 8393 ms 90% : 9523 ms 99% : 9523 ms
RESULTS (with connection pool): -------- mean: 816 ms 50% : 786 ms 75% : 883 ms 90% : 953 ms 99% : 953 ms
Slick is great and has fantastic docs but sometimes to get past the getting started example you need to hack a bit. Never use it (or any data access lib) in prod without a database connection pool.
(all code available at this github repository)