Testing HAProxy reload - The NinjaSysadmin : The Ninja SysAdmin[web search]
HAProxy load balancer testing
I needed to look into an appropriate solution to prevent requests being dropped when adding new nodes to my HAProxy load balancer. Initially I thought that by upgrading to a later version (1.4.9) I could take advantage of a `reload` function within the init script. Something that the community has been screaming out for a long time.
After installing HAProxy, I checked the init script and could see the function, all be it looked like the right one and following the code I could see that it would store connections and load a new binary.
The interesting portion of the init script function is `$exec -D -f /etc/$prog/$prog.cfg -p /var/run/$prog.pid -sf $(cat /var/run/$prog.pid)` which reloads the config file and preserves the existing PID number waiting for it finish with connections and therefore kill when complete. Well that's what I thought, looks like I was wrong.
I fired up a machine I have with HTTPerf installed and ran some connection tests without issuing any init controls to get a feel for how my load balancer stacks up. Not a problem, the load balancer easily took 1000 requests per second without bottlenecking which shows that my plan to implement HAProxy wouldn't slow down my application stack.
Now to try the tests whilst issuing a `reload` from the load balancer. I fired up the HTTPerf test and tried again, waiting 2 seconds before issuing the `reload` (I used a script to make this fair). The results were not conclusive, but they did indicate that requests were being given HTTP 5xx which breaks my application stack. Effectively this is giving users a 503 Timeout, which is something I really don't want to do.
I decided to scale down the requests and try the tests again and also received the same results. Below are my findings of `reload` verses `restart`.
|Result 1||Result 2||Result 3||Result 1||Result 2||Result 3|
|Test 1: 10 connections per second||0||0||0||1||2||1|
|Test 2: 50 connections per second||5||0||0||8||10||9|
|Test 3: 100 connections per second||1||0||9||12||12||16|
|Test 4: 200 connections per second||2||8||0||25||33||59|
|Test 5: 400 connections per second||0||4||0||96||86||62|
To sum up, I'm going to have to investigate the uses of Nginx as a suitable load balancer to overcome this issue which we all hoped had been resolved…. Unless I'm missing something? Who knows? I'll find out and get to the bottom of it.