The whole playbook from this blog post can be seen in this gist.
Ansible is a Python-based tool for automating application deployment and infrastructure setups. It's often compared with Capistrano, chef, puppet and fabric. These comparisons don't always compare apples to apples as each tool has their own distinctive capabilities and philosophy as to how best automate deployments and system setups. Flexibility and conciseness of describing tasks varies widely between tools as well.
I've had to use every tool mentioned above at one time or another for various clients. What I like most about Ansible is that it keeps playbooks concise while not abstracting anything away. Some tools try to hide the differences between apt install and yum install but I found these abstractions made for a steeper learning curve and made out-of-the-ordinary changes take longer to get working.
Ansible can be installed via pip and just needs an inventory file to begin being useful.
For this post I try to keep Ansible's files to a minimum. You can organise playbooks into separate files, setup Travis CI to test your playbooks and so on but for the sake of simplicity I stick to the task of getting a load-balanced, two-node Django cluster setup with as few lines of code as I could.
A cluster of machines
I launched three Ubuntu 14.04 virtual machines on my workstation, configured them with the user mark, identical passwords and sudo access. I then added the three virtual machine IP addresses to my hosts file:
$ grep 192 /etc/hosts
192.168.223.131 web1
192.168.223.132 web2
192.168.223.133 lb
I copied my SSH public key to the ~/.ssh/authorized_keys file on each VM:
$ ssh-copy-id web1
$ ssh-copy-id web2
$ ssh-copy-id lb
I then made sure each host's ECDSA key fingerprint was stored in my ~/.ssh/known_hosts file by connecting to each host:
$ ssh lb uptime
$ ssh web1 uptime
$ ssh web1 uptime
Running Ansible with an inventory
I installed Ansible:
$ pip install ansible==1.7.2
And created an inventory file:
$ cat inventory
[load_balancers]
lb ansible_ssh_host=lb
[app_servers]
web1 ansible_ssh_host=web1
web2 ansible_ssh_host=web2
The first token is the host alias used by Ansible, the second token is the connection instruction for Ansible. I had already created web1, web2 and lb in my /etc/hosts file so I simply referred to those hostnames.
With an inventory file in place it's possible to test that Ansible can connect and communicate with each node in the cluster:
$ ansible all -i inventory -a uptime
lb | success | rc=0 >>
09:07:18 up 7:18, 2 users, load average: 0.05, 0.03, 0.05
web2 | success | rc=0 >>
09:07:18 up 7:19, 2 users, load average: 0.01, 0.02, 0.05
web1 | success | rc=0 >>
09:07:19 up 7:23, 2 users, load average: 0.00, 0.01, 0.05
A cluster of configuration files
Ansible will need to deploy configuration files, keys and certificates to our cluster. Below is the file and folder layout of this project:
$ tree
.
├── files
│ ├── nginx-app.conf
│ ├── nginx-load-balancer.conf
│ ├── nginx.crt
│ ├── nginx.key
│ ├── ntp.conf
│ ├── supervisord.conf
│ └── venv_activate.sh
├── inventory
└── playbook.yml
SSL-terminating load balancer
To create files/nginx.crt and files/nginx.key I generated a self-signed SSL certificate using openssl. The certificate won't be of much use to https clients that verify certificates against trusted authorities but it's useful for demonstrating ssl termination by the load balancer in a local environment.
$ openssl req -x509 -nodes -days 365 \
-newkey rsa:2048 \
-keyout files/nginx.key \
-out files/nginx.crt
...
Common Name (e.g. server FQDN or YOUR name) []:localhost
...
There are two distinctive nginx configurations in this setup: the first is for the load balancer and the second is for the app servers.
I chose nginx for the load balancer as it supports SSL, caching and handles app servers restarting more gracefully than haproxy.
$ cat files/nginx-load-balancer.conf
upstream app_servers {
{% for host in groups['app_servers'] %}
server {{ hostvars[host]['ansible_eth0']['ipv4']['address'] }} fail_timeout=5s;
{% endfor %}
}
server {
listen 80;
server_name localhost;
return 301 https://$server_name$request_uri;
}
server {
listen 443;
server_name localhost;
ssl on;
ssl_certificate /etc/nginx/ssl/nginx.crt;
ssl_certificate_key /etc/nginx/ssl/nginx.key;
location / {
proxy_pass http://app_servers;
}
}
The nginx config for the app servers simply proxies requests to gunicorn:
$ cat files/nginx-app.conf
server {
listen 80;
server_name localhost;
location / {
proxy_pass http://127.0.0.1:8000;
}
}
Keeping clocks in sync
I wanted to keep the clocks on each node in the cluster in sync so I created an NTP configuration file. I choose the European pool but there are pools all around the world listed the NTP pool project site.
$ cat files/ntp.conf
server 0.europe.pool.ntp.org
server 1.europe.pool.ntp.org
server 2.europe.pool.ntp.org
server 3.europe.pool.ntp.org
If you wanted to use the North American pool you could replace the contents with the following:
server 0.north-america.pool.ntp.org
server 1.north-america.pool.ntp.org
server 2.north-america.pool.ntp.org
server 3.north-america.pool.ntp.org
Managing gunicorn and celeryd
I'll use supervisor to run the Django app server and celery task queue. The files below have a number of variables in them, Ansible will transform them when it deploys them.
$ cat files/supervisord.conf
[program:web_app]
autorestart=true
autostart=true
command={{ home_folder }}/.virtualenvs/{{ venv }}/exec gunicorn faulty.wsgi:application -b 127.0.0.1:8000
directory={{ home_folder }}/faulty
redirect_stderr=True
stdout_logfile={{ home_folder }}/faulty/supervisor.log
user=mark
[program:celeryd]
autorestart=true
autostart=true
command={{ home_folder }}/.virtualenvs/{{ venv }}/exec python manage.py celeryd
directory={{ home_folder }}/faulty
redirect_stderr=True
stdout_logfile={{ home_folder }}/faulty/supervisor.log
user=mark
I also need a bash file that will activate the virtualenv used by Django:
$ cat files/venv_activate.sh
#!/bin/bash
source {{ home_folder }}/.virtualenvs/{{ venv }}/bin/activate
$@
When venv_activate.sh is uploaded to each web server it'll sit in /home/mark/.virtualenvs/faulty/exec.
Writing the playbook
I wanted to keep the number of files Ansible I was working with as low as I thought sensible for this blog post. Ansible is very flexible in terms of file organisation so a more broken up and organised approach is possible but for this example I'll just use one playbook.
There's much more hardening that could have been done with these instances. For the sake of conciseness I kept the number of tasks down.
I've broken playbook.yml into sections and will walk through each explaining what actions are taking place and reasoning behind each.
Too see the whole playbook.yml file please see this gist.
SSH tightening
The first tasks are to disable root's ssh account and remove support for password-based authentication. I already stored my public ssh key in each server's authorized_keys file so there is no need to login with password authentication. Once these configuration changes are made the ssh server will be restarted:
---
- name: SSH tightening
hosts: all
sudo: True
tasks:
- name: Disable root's ssh account
action: >
lineinfile
dest=/etc/ssh/sshd_config
regexp="^PermitRootLogin"
line="PermitRootLogin no"
state=present
notify: Restart ssh
- name: Disable password authentication
action: >
lineinfile
dest=/etc/ssh/sshd_config
regexp="^PasswordAuthentication"
line="PasswordAuthentication no"
state=present
notify: Restart ssh
handlers:
- name: Restart ssh
action: service name=ssh state=restarted
There are three dashes at the top of this file as it's a YAML convention.
Package cache
Every system needs APT's cache updated. You can append update_cache=yes to specific package installations but I found it was required for every package installation so I perform the update once per machine rather than per package.
- name: Update APT package cache
hosts: all
gather_facts: False
sudo: True
tasks:
- name: Update APT package cache
action: apt update_cache=yes
Syncing to UTC
I then set the time zone on each machine to UTC and setup NTP to synchronise their clocks with the European NTP pool. Note when dealing with dpkg-reconfigure make sure you pass --frontend noninteractive, otherwise Ansible will freeze while waiting for dpkg-reconfigure to accept input that Ansible isn't configured to capture interactively.
- name: Set timezone to UTC
hosts: all
gather_facts: False
sudo: True
tasks:
- name: Set timezone variables
copy: >
content='Etc/UTC'
dest=/etc/timezone
owner=root
group=root
mode=0644
backup=yes
notify:
- Update timezone
handlers:
- name: Update timezone
command: >
dpkg-reconfigure
--frontend noninteractive
tzdata
- name: Syncronise clocks
hosts: all
sudo: True
tasks:
- name: install ntp
apt: name=ntp
- name: copy ntp config
copy: src=files/ntp.conf dest=/etc/ntp.conf
- name: restart ntp
service: name=ntp state=restarted
I also made sure each machine has unattended upgrades installed.
- name: Setup unattended upgrades
hosts: all
gather_facts: False
sudo: True
tasks:
- name: Install unattended upgrades package
apt: name=unattended-upgrades
notify:
- dpkg reconfigure
handlers:
- name: dpkg reconfigure
command: >
dpkg-reconfigure
--frontend noninteractive
-plow unattended-upgrades
Setup the Django app server
The Django app servers have the largest number of tasks. Here is a summarised list of what's being performed:
- Setup uncomplicated firewall to block all incoming traffic except for tcp port 22 and 80; HTTP traffic should only come from the load balancer; ssh logins are rate-limited to slow down brute force attacks.
- Install Python's virtualenv and development libraries.
- Install git and checkout a public repo of a Django project hosted on Bitbucket.
- Copy over the virtual environment activation script which is used by every Django command.
- Install a virtual environment and the Django project's requirements.
- Setup and migrate Django's database. The database doesn't need to be shared between web servers for this application so it's just a local SQLite3 file on each individual server.
- Install supervisor and have it run the gunicorn app server and celery task runner.
- Launch an nginx reverse proxy which acts as a buffer between gunicorn and the load balancer.
- name: Setup App Server(s)
hosts: app_servers
sudo: True
vars:
home_folder: /home/mark
venv: faulty
tasks:
- ufw: state=enabled logging=on
- ufw: direction=incoming policy=deny
- ufw: rule=limit port=ssh proto=tcp
- ufw: rule=allow port=22 proto=tcp
- ufw: >
rule=allow
port=80
proto=tcp
from_ip={{ hostvars['lb']['ansible_default_ipv4']['address'] }}
- name: Install python virtualenv
apt: name=python-virtualenv
- name: Install python dev
apt: name=python-dev
- name: Install git
apt: name=git
- name: Checkout Django code
git: >
repo=https://bitbucket.org/marklit/faulty.git
dest={{ home_folder }}/faulty
update=no
- file: >
path={{ home_folder }}/faulty
owner=mark
group=mark
mode=755
state=directory
recurse=yes
- name: Install Python requirements
pip: >
requirements={{ home_folder }}/faulty/requirements.txt
virtualenv={{ home_folder }}/.virtualenvs/{{ venv }}
- template: >
src=files/venv_activate.sh
dest={{ home_folder }}/.virtualenvs/{{ venv }}/exec
mode=755
- command: >
{{ home_folder }}/.virtualenvs/{{ venv }}/exec
python manage.py syncdb --noinput
args:
chdir: '{{ home_folder }}/faulty'
- command: >
{{ home_folder }}/.virtualenvs/{{ venv }}/exec
python manage.py migrate
args:
chdir: '{{ home_folder }}/faulty'
- name: Install supervisor
apt: name=supervisor
- template: >
src=files/supervisord.conf
dest=/etc/supervisor/conf.d/django_app.conf
- command: /usr/bin/supervisorctl reload
- supervisorctl: name=web_app state=restarted
- supervisorctl: name=celeryd state=restarted
- name: Install nginx
apt: name=nginx
- name: copy nginx config file
template: >
src=files/nginx-app.conf
dest=/etc/nginx/sites-available/default
- name: enable configuration
file: >
dest=/etc/nginx/sites-enabled/default
src=/etc/nginx/sites-available/default
state=link
- service: name=nginx state=restarted
The load balancer
The load balancer has a simpler task list:
- Block all incoming traffic except for tcp 22, 80, 443; rate limit ssh.
- Install Nginx and copy in the self-signed certificates.
- Copy in the load balancer configuration and launch nginx.
- name: Setup Load balancer(s)
hosts: load_balancers
sudo: True
tasks:
- ufw: state=enabled logging=on
- ufw: direction=incoming policy=deny
- ufw: rule=limit port=ssh proto=tcp
- ufw: rule=allow port=22 proto=tcp
- ufw: rule=allow port=80 proto=tcp
- ufw: rule=allow port=443 proto=tcp
- apt: name=nginx
- name: copy nginx config file
template: >
src=files/nginx-load-balancer.conf
dest=/etc/nginx/sites-available/default
- copy: src=files/nginx.key dest=/etc/nginx/ssl/
- copy: src=files/nginx.crt dest=/etc/nginx/ssl/
- name: enable configuration
file: >
dest=/etc/nginx/sites-enabled/default
src=/etc/nginx/sites-available/default
state=link
- service: name=nginx state=restarted
Running the playbook
I used the following command to run the playbook and setup the cluster:
$ ansible-playbook -i inventory --ask-sudo-pass playbook.yml
I then tested that I could communicate via the load balancer. If --insecure is not passed as a flag to curl you'll not be able to complete the request as curl is setup to not trust self-signed ssl certificates by default:
$ curl --insecure https://lb
k2b71#v!l0_sf7y$0)x(=cw2u_^q05etbf9ediptp(#0m+&=^0
81jy$7n=!3ay%p3o%$e!iv8hknbuyl64*o-sue1xcgygp^owlb
fne-$j$^qyv*^me3r5kx=p^#*+y!t)gq!^a)9_dhs4afcx2x!2
7s5@po!&)zo#ca=16-o0gmv!440%1$q2xgne+uerpp7@*bt*l8
m!y*$2o)8r(tmf!b(*72$knb$&(gt1jspn&h4tu^s#9-3(+x&b
s#(vta0x68#4ihpw1sds06=fjcj9!am8c4c32zy95_0=%==$s(
-j(3pnb^4x)##(^@n)&)fe3#zl2mb&(s1qj5#)9%+ng6%sj%7n
c02$ahq#t$t)1s12-nj!yolz+v687zpefug_o7!+w7055gt5g$
7j8v%$)o50ch(-^#q3^7(dtgl3lvg2orirk$e54l&k89jxj#-1
g@^_eanx#*@4&8kg!xi(va^_@@4xyjz7h497$iw*1=^sb797il
88hmb=+c9+^#2r3x$e7nl)nlf8rb^