Keeping Your Server Clocks Synced
EC2 servers are started in sync with the VM host’s clock. They are also configured to use the OS’s standard time servers – for Amazon Linux AMIs, these are *.amazon.pool.ntp.org, and for Ubuntu it’s *.ubuntu.pool.ntp.org.
This is great if your server is on “classic” EC2 so it can access the internet.
However, inside a VPC – which comes with increased security – only servers with Elastic IPs can connect to the internet through the gateway. Even then, only if their security groups and the network ACLs permit it.
Of course, when you are dealing with clusters of servers, keeping their clocks in sync is of the utmost importance. In our tests, we were seeing significant drift, which could potentially cause lots of issues when working with so many software clustering tech such that we do.
However, as I alluded to above, it’s not really a case of just installing NTP. In fact, there are several architectural questions that need to be addressed prior to fully rolling it out.
VPCs are limited in the number of Elastic IPs – and therefore servers which can connect directly out to the internet – unless you have asked Amazon very nicely to have your limits raised. Additionally, as we have recently seen with attacks against popular online gaming platforms, NTP itself can present security issues – so it would not be ideal for us to have all of our servers using public upstream NTP servers.
How Does NTP Work?
NTP works on a distributed model where there is no single point of failure. It would be prohibitively expensive for one person or entity to support the volume of traffic required to keep all the world’s computer clocks in sync!
NTP servers are assigned tiers, or “stratum”, to indicate which level they are accurate to. The most accurate NTP servers, designated stratum 1, are reference clocks powered by various radioactive sources, GPS or radio etc (which are called “stratum 0 devices” and must be directly linked to the stratum 1 computer). These are special servers in that they require special hardware to operate correctly.
Stratum 2 servers are NTP servers that are synced directly from stratum 1 servers (those with accurate clocks attached). Stratum 3 servers are synced from stratum 2 servers, and the sequence continues until you reach the lowest level, stratum 15. Stratum 16 is a special designation which indicates unsynchronised – often NTP servers will reject synchronisation if a server with a high enough stratum is not available, the synchronisation could be in progress, or there are simply no servers available to synchronise to.
You can read more about NTP on Wikipedia.
What Setup Do We Want?
Because NTP has built-in support for synchronisation, we can optimise our network configuration to make sure that all the servers are in sync with each other and the rest of the world, without exposing all of them to the network.
Inside a public-facing network of our VPC, we can designate an NTP server to sync to upstream servers on the internet. Then, we are able to configure all of the servers inside our VPC to sync to that one server, keeping everything in line.
In all of our current stacks, our designated NTP instance is stratum 2, which means it is directly connected to an upstream stratum 1 server. Therefore, all of the other servers we have are stratum 3 – which means they have excellent synchronisation.
Security and Networking
We have a dedicated NTP instance in our VPC’s public subnet. The network ACLs need to be configured to allow NTP traffic in both directions. The public subnet the NTP server needs to allow NTP (UDP port 123) from both the private subnets you wish to use, as well as the internet. For responses, we found UDP ports 1024-65535 also needed to be open to allow the inbound packets for the UDP requests (remember that network ACLs are stateless firewalls, meaning inbound replies to outbound traffic needs to be explicitly permitted). It also needs to allow NTP outbound on UDP port 123 to make requests.
Every server to be NTP synced needs a security group applied which allows UDP 123 outbound to a security group applied to the NTP server – as each instance can only be in 5 security groups, we use one group to allow ops traffic such as NTP and SSH together. The NTP server’s security group only needs to allow NTP (UDP port 123) in from the aforementioned security group, as well as outbound NTP (UDP 123) to all subnets (0.0.0.0/0) to contact external NTP servers.
The final piece of the networking puzzle is related to the infrastructure. AWS makes it possible to easily update the AMIs that instances run through CloudFormation. However, for a static instance – whenever a new one is started – it will get a different IP address within the VPC. This is not usually a problem, as either clustering software doesn’t need the specifics or you can use a load balancer. However NTP is incompatible with the AWS load balancers (and it seems a bit of a waste to have a load balancer for only one instance) so we work around this problem by making sure the NTP server, even if it has a new instance provisioned, has the same IP within the VPC.
How it Works
The first step is to provision a new EC2 Network Interface, and when you do this you can specify its Private IP address. We use something easy to remember, like 10.0.0.10. These are examples of CloudFormation JSON snippets you could base a setup on.
You can then also attach an Elastic IP to that network interface, using an EIP Association.
Then associate the above EIP to the interface:
Finally, attach the network interface to the instance using a Network Interface Attachment.
When the server starts, you can have it use only that network interface, which ensures it listens to the right sockets and can connect to the internet successfully.
Note the final line to set the DNS is required so that the NTP server can resolve the DNS of external NTP servers. Final Server NTP Configuration The final piece of the puzzle is simply to install and configure NTP on all of your AWS instances, using your favourite configuration management software or updating your images. The NTP configuration required is very simple: remove all of the “server” directives, and add one for your own private IP address.
To check whether a server is synchronised, you can use the command ntpq -pn. This gives you an output like below, telling you that it is synchronised. Bear in mind that sometimes it can take your NTP server a little while to sync, so give it half an hour before seeing if you have a problem.