Sometimes you need to be able to be notified when files are added to an S3 bucket – a good example of this is when you wish to instantly mirror
the S3 bucket, downloading the new file as soon as it’s created or updated.
A good example of this is when fetching your ELB logs as soon as they are available to be processed by logstash, something we’ve blogged about previously.
The most flexible way to do this is to have the new S3 bucket events push the
ObjectCreated events into a SNS topic which is chained to an SQS queue. You can then consume this queue within your infrastructure.
With a couple of scripts it’s super easy to do this in just a few commands, so we thought we’d share the process of getting it up and running.
bucket=mybucket queue=s3-object-created-$(echo $s3_bucket_name | tr '.' '-') ./create_queue.sh $bucket $queue nohup ./mirror_bucket.py /my/mirror/dir $queue $bucket &
That’s it! Now all the new files that are added to the bucket will be mirrored into
/my/mirror/dir. You can then have logstash (for example) consume these logs, and delete them after processing, or a period of time, for example:
find /my/mirror/dir -mindepth 1 -mtime +5 -delete
We embed this within a docker container, so when the container comes up it starts downloading new log files from S3 to be processed by logstash. Easy!
Turn the web into data for free
Create your own datasets in minutes, no coding required
Powerful data extraction platform
Point and click interface
Export your data in any format
Unlimited queries and APIs