Paul Hammant's Blog: Elastic Environments in Source-Control with Ansible
This is a group effort: Jefferson Girao helped me get the design right, and David McHoull and Deluan Quintao from our ThoughtWorks’ Toronto office worked through to an implementation. Additionally, David has written much of the blurb below.
Infrastructure As Code (IaC): App Environments
IaC pertains to much more than just environments for business apps, but this article focussed on just that. The elasticity we want in this context, is to provision an environment quickly and to de-provision it quickly if needed. Recreation quickly would be assumed, and it would mean that application dev teams could relinquish environments without worrying about delays/red-tape in getting them back. That’s the key business benefit to very large organizations.
Say we have a corporation with an application (public or private), that’s built by Continuous Integration (CI). Each application could have more than one environment:
- The Environment that CI uses for automated functional testing
- Shared Dev
- QA
- UAT
- Staging
- Production
The environments that the application could be deployed into is somewhat orthogonal to the application itself. Indeed, those environments won’t all have the same version of the app installed in them. We could easily have more than one QA environment if we’re developing concurrent development of consecutive releases.
In recent years the industry has used Puppet (and alike) for much of the configuration of machines in a stack. DevOps best-practioners use a source-control system for that, and quite often one that is separate to the application’s source repo. I think it is fair to say that you can’t have IaC without that being under source control too, and that engineered systems that don’t allow to round-trip editing of blueprints/scripts in source-control will become obsolete.
Provisioning Goal
Say we’re wanting to make a QA2 environment from unused VMs managed via the API of a hosting org or service, for a ‘Pet Store’ Ruby-on-Rails app, like so:
Or maybe we want to split that into two VMs for the QA2 environment:
Either way, browsers (Firefox shown) can’t use the application until provisioning and app deployment are complete.
Tersest Environment Blueprints
Puppet isn’t especially terse though, and therein lies the reason for this blog entry. Consider a very terse meta definition language, and a ‘pet store’ application example, for it like so:
machines:
db:
mysql
www:
ruby19
passenger
nginx
network:
domain: qa1.example.com
(Filename petstore/qa1.yml)
You see that We have keywords for Ruby 1.9, Passenger, Nginx and MySQL in this design. YAML is very human readable of course, and the ideal way to still meet the “as code” requirement, as well as being very compatible with source-control. We are also not saying where the item environment will be provisioned, and are instead leaving this to the DevOps team. The where/how is controlled outside this YAML file, and could be changed without changing this script.
We have a repo on Github, that holds the blueprints that we used to prove the concept: github.com/deluan/petshop
Here is how we want to roughly use that on the command line:
petshop$> auto-environment qa1.yml ../our_roles_dictionary -q
db.qa1.example.com provisioned
with common, mysql
www.qa1.example.com provisioned
with common, ruby19, passenger, nginx
The ‘auto-environment’ command is a Ruby Gem, though it could easily be Python etc.
I’ll describe ‘our_roles_dictionary’ shortly. Ansible is the technology that is actually doing the provisioning, so we’ll generate an Ansible grammar from that blueprint:
- include: /path/to/ansible/our_roles_dictionary/create.yml
- name: Configuring db.qa1.example.com
hosts: db
roles:
- common
- mysql
- name: Configuring www.qa1.example.com
hosts: www
roles:
- common
- ruby19
- passenger
- nginx
The include line (line #1 above) refers to an roles playbook that is responsible for provisioning the hosts. In this case it launches instances on EC2, but it could just as well be using Rackspace, Vagrant, or anything else. The rest of the generated playbook lists the roles that should be configured on each machine. The lines “hosts: db” and “hosts: www” define what group of hosts the roles should be configured on. In our case, since we don’t know the IPs or public hostnames of the instances beforehand, we have to build these host groups dynamically, adding the IP address of each instance to the appropriate group as they are launched (or found already running). The Ansible EC2 module makes this easy using “register” and “with_items”:
- name: Launch instance
local_action:
id: ""
instance_tags: ""
module: ec2
keypair: ""
instance_type: t1.micro
region: us-east-1
image: ami-93f4a8fa
group: ""
wait: yes
register: ec2
- name: Add new instance to host group
local_action: add_host hostname= groupname= ansible_ssh_user=ubuntu
with_items: ec2.instances
A Dictionary of Roles
To accomplish this, Deluan and David created another Github project github.com/deluan/petshop-rails-ansible with all the roles and tasks needed to install MySQL, Ruby, Passenger, Nginx, and dependencies. This project also provides an Ansible playbook that handles the provisioning of the machines, in this case, by launching instances on EC2, opening the appropriate ports, then configuring the DNS records through Amazon’s Route 53 (which is great but has usage costs). Roles is an ansible thing that’s reusable/templatey. After a bit of an initial learning curve to get their heads around Ansible and its conventions, these roles were fairly easy to write, they said. The EC2 tasks in particular were very straightforward thanks to modules already included with Ansible.
Above I was referring to the idealized “our_roles_dictionary”, but for the proof of concept it was ‘petshop-rails-ansible’. The bigger dictionary would grow out of this, and we’d rename it later as we subsumed second and subsequent applications.
One blueprint per environment.
An environment ‘QA1’ was detailed above, alluding to one YAML file per environment:
petshop/
dev.yml
qa1.yml
qa2.yml
staging.yml
production.yml
Benefits:
- We can diff environments on the command line.
- We can search (grep, ack) through a whole tree looking for things.
Of course there are benefits from being in source-control generally, and much of what I wrote in Very Small Data applies.
With or without the application itself?
The provisioning technology could easily deploy the application after setting up the base OS, and installing packages. That’ll work if the application is in a binary store, and ‘latest’ is identifiable. It could be that the provisoning app perfectly sets up the base environment, but there’s no installed apps or table-shapes. In that case a regular CI tool, could have a on-demand deployment capability that has a drop-down of environments (subject to permissions) to deploy into. That would pick up where the provisioning app left off, and deploy the app itself as well as do schema/table-shapes work. Indeed, that CI on-demand capability should not require freshly provisioned environments necessarily, even if it can also work that way.
Next Steps
We might get time to push forward with:
De-provisioning
Add the ability to de-provision hosts. An example session could look something like this (the -d parameter)
$> time auto-environment -d environments/qa2.yaml
www.qa2.petstore.dev.example.com de-provisioned
db.qa2.petstore.dev.example.com de-provisioned
-> DNS entries might remain cached for a while
1.02 real 0.60 user 0.10 sys
Usage Database
De-provisioning could work by just using the domain names implicit in the YAML to locate allocated VMs, and from that interoperate with the hosting system to de-provision them. The YAML might have been updated since last provisioning though, so you’d really want to interrogate a real DB that has usage info in it.
True auto-provisioning
There’s no reason why a CI daemon could not spot the changes to appname/envname.yml files and automatically de-provision and re-provision an environment. Not for production perhaps, but the only reason why it would not work for dev/qa/uat/staging is the implicit dropping of the data accumulated since the last provisioning. Truly Agile teams have canned starter data in source-control as part of the application build/deploy - records in the style of “Fred Flintstone” test data. De-provision/re-provision is infinitely easier than applying deltas to an existing environment and actually much more attractive as it is perfectly repeatable.
Practicalities
Whereas you might have been happy with www.qa.petstore.dev.your_company.com
for the QA environment, production would be www.petstore.your_company.com.
or petstore.your_company.com.
. Tying the Amazon Route53 service to *.dev.your_company.com
would seem smart, but maybe you want real DNS to control the production domains, without a per-use fee. That’s easier with a dev
sub-domain, and would allow you to neatly avoid the corporate SecOps people while developing. Route53 is very scriptable over rest, but is the same true of your regular choice of DNS provider? Not always is the answer.
This is duplicating the work of ..
.. A team that colleague Sam Newman managed for project called Phoenix a year ago. Sam’s one is pretty similar, though it had all environments in one file. What it did have not, so far, got round to coding was the database that chronicles usage.
Note: Every company of sufficient size and age, has a project called Phoenix. Or two.
Update: July 30, 2014
Hashicorp have a new technology Terraform which is in the same space - terse blueprints under source-control - yay!