Whenever you start to manage a cluster of servers, you quickly understand the pain of configuring them. Small, in-house clusters are manageable…but once you start to scale to ten, twenty or a hundred servers, you’d better have a systematic, automated way to provision and configure new servers.
There are a number of well known configuration management tools, such as Chef and Puppet. These are great tools, but relatively complicated and require some time to get up and running.
If you are on EC2, I highly recommend the excellent Chef Cookbook for bootstrapping an ElasticSearch cluster.
I use Ansible, a lightweight configuration management tool that works over SSH and doesn’t require preinstalled software to bootstrap. It is useful for both automated deployment/provisioning (e.g. configure a bare OS server with everything required to join an ES cluster) as well as ad hoc commands (e.g. update the ES configuration file for all the nodes in the cluster).
This tutorial is going to be a simple walkthrough on installing ElasticSearch on a bare Debian 6 OS through Ansible. A word of warning: there is a near 100% probability you’ll have to adjust parts of this tutorial for your specific setup. Don’t take it as gospel, use it as a guide.
In this article, we are going to setup and provision a 3-node ElasticSearch cluster. There is also a “Director” server which will be running Ansible and providing the setup instructions. The Director will not be participating in the ES cluster.
The first step is to install Ansible on your Director server. Ansible requires a few basic requirements which you may already have installed. In this article, we are starting with completely barebones Debian 6 – a minimal installation that doesn’t even have Python yet. So, install the dependancies:
$ apt-get install python2.6 python-paramiko python-yaml python-jinja2 git
And then obtain Ansible itself from Github:
$ git clone git://github.com/ansible/ansible.git
$ cd ./ansible
$ source ./hacking/env-setup
All done. That was easy, right?
Since Ansible works over SSH, you need to make sure your Director server has the SSH keys to the rest of your cluster.
Word of Warning: For brevity and simplicity, this article uses root for all node configurations. A better solution is configuring sudo appropriately, which Ansible supports. If you are using a setup like this for production, we strongly recommend configuring appropriate user privledges.
First, setup your SSH keys on the Director server (you can skip this if you’ve already generated keys in the past):
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/polyfractal/.ssh/id_rsa):
Created directory '/home/polyfractal/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/polyfractal/.ssh/id_rsa.
Your public key has been saved in /home/polyfractal/.ssh/id_rsa.pub.
Then authenticate into your cluster nodes, create a directory at ~/.ssh on each cluster node, and append the Director’s SSH keys to the cluster node’s authorized_keys file. Repeat this process for each node, replacing `root@Cluster1` with the appropriate IP or domain of your cluster node:
$ ssh root@Cluster1 mkdir -p .ssh
$ cat .ssh/id_rsa.pub | ssh root@Cluster1 'cat >> .ssh/authorized_keys'
Ansible is incredibly lightweight, but it does require Python to be installed on a system. Most OS installations come with Python…but not ours. Luckily, Ansible supports `raw` commands that are sent straight over SSH. This allows you to bootstrap anything you need through Ansible.
So in our case, the first step is to send out a command which installs Python2.6. However, before we can do that, we need to tell Ansible about our nodes. Open up /etc/ansible/hosts and add your node IPs/domains:
The `[es_cluster]` tag is simply a way to group servers together and refer to them by name later. Not necessary, but very handy. With that done, we can send out an Ansible request to all nodes instructing them to install Python:
ansible es_cluster -u root -m raw -a "apt-get update && apt-get install -y python"
This line tells Ansible to talk to the `es_cluster` group that we created in /etc/ansible/hosts, using the root user (Ansible defaults to the current user, and since I’m logged in as a non-root user on the Director server, we have to specify the root user for the Ansible blast). It then specifies the “raw” module, which allows you to send arbitrary SSH commands.
Finally, it specifies the action that we want performed, namely an apt-get update and then installation of python. Ansible should pause, and then spew a lot of text from the Apt output. When Ansible is done (and assuming nothing went wrong), you can continue with the configuration.
Build a Playbook
The above command was a good demonstration of an ad hoc command – using Ansible to send out a single command that you need executed. This is great for initial bootstrapping, or for pushing configuration changes to nodes. But what about configuring the server with a long list of instructions?
For that, you need to build a Playbook. A playbook is just a simply YAML file that defines a list of instructions, using any number of Ansible Modules.
All files referenced in the rest of this article can be found in this Github repo
Let’s walk through the playbook:
- hosts: es_cluster
- name: install python-apt
raw: "apt-get install -y python-apt"
- name: update apt cache
- name: install htop
apt: pkg=htop state=installed
The YAML starts off with the `hosts` directive, which tells Ansible which group of nodes to interact with (identical to how we used it above in the ad hoc command). It also tells Ansible to use the root user again. If you were to be using sudo and a different user, this is the location to define that behavior (see the Ansible docs about how to do it).
The playbook then begins to define `tasks`, which are an ordered, idempotent list of actions. Each action is performed sequentially. If the action fails, the host is removed from the update – but the non-failing nodes will continue to process the Playbook. At the end of the Playbook, you’ll receive a list of nodes that succeeded/failed.
The first action is a `raw` command, which installs python-apt. This is necessary so that we can use the `apt` module later in the Playbook. This step may not be necessary if your OS has a more fully provisioned Python distribution.
The second action uses the `apt` module to update the Apt cache, and the third action installs htop. I just like having htop on systems – there is no reason you need that package