Let's Have a War • Slick Connection Pooling

Jay Taylor's notes

Let's Have a War • Slick Connection Pooling

Original source (fernandezpablo85.github.io)
Tags: scala orm slick fernandezpablo85.github.io
Clipped on: 2014-04-16
Meet SlickSlick (formerly ScalaQuery) is a nice scala library for accessing your database, works with many vendors and it’s part of the typesafe stack.
Slick’s documentation is awesome to get you started quickly. Unfortunately there are some important topics missing from the docs or examples which are mandatory for using it in a production environment.
A simple exampleLet’s create a data access object and a main App to exercise it. Following the official docs this is kinda trivial:

      
        
          
            1
            2
            3
            4
            5
            6
            7
            8
            9
            10
            11
            12
            13
            14
            15
            16
            17
            18
            19
            20
            21
            22
            23
            24
            25
            26
            27
            28
            29
            30
            31
            32
            33
            34
            35
            36
            37
            38
            39
            40
          
          
            package org.example
 
import scala.slick.driver.PostgresDriver.simple._
import Database.threadLocalSession
 
object UserDao {
  
  val database = Database.forURL("jdbc:postgresql://localhost:5432/demo", 
                          driver = "org.postgresql.Driver")
 
  case class User(id: Option[Int], name: String, last: String)
 
  object Users extends Table[User]("users") {
    def id = column[Int]("id", O.PrimaryKey, O.AutoInc)
    def name = column[String]("name")
    def lastName = column[String]("last_name")
 
    def * = (id.? ~ name ~ lastName) <> (User, User.unapply _)
 
    // Shim needed to handle Postgress - Slick quirks with autoincrement columns.
    def insertProjection = (name ~ lastName) <> (
      {(first, last) => User(None, first, last)},
      {(user: User) => Some(user.name, user.last)}
    )
  }
 
  def dropAndCreateTable() = database withSession {
    Users.ddl.drop
    Users.ddl.create
  }
 
  def insert(name: String, lastName: String) = database withSession {
    Users.insertProjection.insert(User(None, name, lastName))
  }
 
  def findByLastName(last: String) = database withSession {
    val query = for (u <- Users if u.lastName is last) yield (u)
    query.list
  }
}

          
        

    

            view raw
            UserDao.scala
            hosted with ❤ by GitHub
          

      
        
          
            1
            2
            3
            4
            5
            6
            7
            8
            9
            10
            11
            12
            13
            14
          
          
            package org.example
 
object Main extends App {
 
  UserDao.dropAndCreateTable()
 
  UserDao.insert("pablo", "fernandez")
  UserDao.insert("noob", "saibot")
  UserDao.insert("uma", "thurman")
 
  val users = UserDao.findByLastName("fernandez")
  
  println(users.head)
}

          
        

    

            view raw
            UserMain.scala
            hosted with ❤ by GitHub
          
Great! A few things to note:
The code is elegant and simple.
There is a minor workaround needed to use auto-increment fields in postgress. More info can be found in this Stack Overflow question. Also, I’ve filed a ticket for it so it’s hopefully going to get fixed soon.
Unfortunately, this code as it is can never go to production.
What’s wrong with that?This code does not use a database connection pool. If you used other ORM or high-level DAO libraries you probably didn’t have to think about that since the framework/library handled it automatically for you. Slick doesn’t.
So? well this means that for every database session (withSession blocks) Slick is opening a new connection to the database server.
The cost of creating and closing a connection every time is prohibitive. We need to reuse them.
Meet c3p0C3PO is a versatile and highly tunable connection pool library used by many ORMs (e.g. Hibernate). It has really good docs too, with information about every configuration value.
Let’s put it to work. Check this line:
Database.forURL("jdbc:postgresql://localhost:5432/demo", driver = "org.postgresql.Driver")
We use Slick’s Database object to create a new database instance from a jdbc url. The scaladocs say that’s not the only way of getting a Database instance, we can also get one from a javax.sql.Datasource.
c3p0 has a class called ComboPooledDataSource which works as a pooled and configurable version of Datasource:

      
        
          
            1
            2
            3
            4
            5
            6
            7
            8
          
          
            import com.mchange.v2.c3p0.ComboPooledDataSource
 
val database = {
  val ds = new ComboPooledDataSource
  ds.setDriverClass(Driver)
  ds.setJdbcUrl(Local)
  Database.forDataSource(ds)
}

          
        

    

            view raw
            c3p0.scala
            hosted with ❤ by GitHub
          
With the pool in place, let’s run some benchmarks to check the performance improvement, for this I’m gonna use this little benchmarking snippet
RESULTS (without connection pool):
--------
 
mean:   29 ms 
50% :   28 ms 
75% :   30 ms 
90% :   33 ms 
99% :   39 ms 
RESULTS (with connection pool):
--------
 
mean:   12 ms 
50% :   12 ms 
75% :   14 ms 
90% :   15 ms 
99% :   16 ms 
Indeed we cut our response times by 50%! Neat!
Should we really worry about 10 msecs?Yes. We should.
The absolute numbers here are deceiving since most of the time saved is network overhead, which is not a lot when both the database server and the application run on the same box. This is common in a development environment like this but not likely in production.
Let’s run these benchmarks against a remote and thus high latency postgress server, using Heroku (note that I live in Argentina so latency is higher than you may experience):
RESULTS (without connection pool):
--------
 
mean:   8069 ms 
50% :   7985 ms 
75% :   8393 ms 
90% :   9523 ms 
99% :   9523 ms 
RESULTS (with connection pool):
--------
 
mean:   816 ms 
50% :   786 ms 
75% :   883 ms 
90% :   953 ms 
99% :   953 ms 
ConclusionSlick is great and has fantastic docs but sometimes to get past the getting started example you need to hack a bit. Never use it (or any data access lib) in prod without a database connection pool.
(all code available at this github repository)