Stack Policies on Amazon CloudFormation Making Dev’s Lives a Little Easier
Here at import·io we are heavy users (and big fans) of Amazon CloudFormation. Since we first shipped to production we have used CloudFormation to manage an entire stack of resources which constitute a single instance of the import·io platform.
Since then we have expanded our resources and tooling to make further use of these templates, such as provisioning staging environments and so on.
However, there has always had to be an extremely careful process in place surrounding updates deployed to the CloudFormation stacks once they have been created.
Updates can be needed for any number of reasons – fixing ops “bugs” such as incorrect firewall configurations, deploying software changes to servers, updating resources such as load balancers, and provisioning new infrastructure or deprecating resources among them. Previously, for each update to a CloudFormation stack, we had to carefully diff the new template against the old and ensure we were aware of what changes would take place. Even updates to staging stacks needed careful scrutiny, as provisioning new stacks is not free nor quick.
Now though, our ops lives have been made much easier recently with the introduction of a new feature of CloudFormation updates which allows the prevention of updates to stack resources when they are not expected.
These so-called “stack policies” take the form of JSON documents which prohibit or permit various operations to specific or general resources within the existing CloudFormation stack (they are not used in stack creation, only during updates).
Why exactly would we want this functionality? Well, our CloudFormation template is getting pretty big (but we’re cutting down as we rationalise our infrastructure all the time) and accidentally editing the wrong resource could cause chaos. Disabling a firewall rule or accidentally causing our database master to restart could cause infrastructure headaches and availability issues that we are very keen to avoid. If your architecture includes components that aren’t highly available, then accidentally bringing them out of commission for an update could cause issues ranging from downtime to data loss.
If you do not specify a stack policy when doing an update over the API or the AWS Console UI, the behaviour is as before – any resource can have any action made upon it by the changes in the CloudFormation JSON template. However, upon specification of a stack policy, all resources are by default prohibited from having any modifications made to them. It is then up to you to explicitly permit a specific resource or set of resources to have actions made to them. This approach means that it is very easy to write a short stack policy for an update to ensure that you don’t cause an irreversible action to a resource that you really would like to have hanging around (e.g. resetting the disk on your database server – you do have automatic backups, right?).
For now, we don’t make extensive use of stack policies when updating our environments, but we are looking to move forward with it very soon. We are also looking at the possibility of some automated tooling to help us generate succinct statement policies for update phases, in addition to rationalising our current process and tooling. As a foundation for this, we are very interested in troposphere, an open-source project dedicated to generating CloudFormation templates with Python, among other tools.
Soon we’re going to be doing some more presentations and posts about AWS and how we use it – watch out on our blog or Twitter for updates.