Instantly mirror an S3 bucket

Sometimes you need to be able to be notified when files are added to an S3 bucket – a good example of this is when you wish to instantly mirror
the S3 bucket, downloading the new file as soon as it’s created or updated.

A good example of this is when fetching your ELB logs as soon as they are available to be processed by logstash, something we’ve blogged about previously.

The most flexible way to do this is to have the new S3 bucket events push the ObjectCreated events into a SNS topic which is chained to an SQS queue. You can then consume this queue within your infrastructure.

With a couple of scripts it’s super easy to do this in just a few commands, so we thought we’d share the process of getting it up and running.

bucket=mybucket
queue=s3-object-created-$(echo $s3_bucket_name | tr '.' '-')
./create_queue.sh $bucket $queue
nohup ./mirror_bucket.py /my/mirror/dir $queue $bucket &

That’s it! Now all the new files that are added to the bucket will be mirrored into /my/mirror/dir. You can then have logstash (for example) consume these logs, and delete them after processing, or a period of time, for example:

find /my/mirror/dir -mindepth 1 -mtime +5 -delete

We embed this within a docker container, so when the container comes up it starts downloading new log files from S3 to be processed by logstash. Easy!

Extract data from almost any website


INSTANT FREE ACCESS