At import.io, we believe that the best DevOps is NoOps.
That doesn’t mean that we don’t like DevOps people. On the contrary, we think they rock!
But ideally, we want everyone to be able to automate their work wherever possible. Instead of spending time on repetitive and mundane tasks, we push those jobs onto computers – leaving ourselves more time to work on great features.
With that in mind, we are starting to adopt microservices patterns as we scale our engineering efforts, so that new features are now delivered as separate components from the main platform. This allows us to iterate on new functionalities quickly, test different technology stacks and involve people who do not have the inside knowledge about the platform in the development process.
Mockup of Scheduled APIs
Our latest project, Scheduled APIs (which will let you run your Import.io APIs on a schedule), gave us a chance to revise our technological stack, and have a look around at new paths for building long lasting components. Specifically, we used a set of new AWS solutions: Amazon Lambda and API Gateway.
What is Amazon Lambda?
Amazon Lambda presents a great approach to creating backend solutions without the hassle of virtual machines, containers or any infrastructure for that matter. It allows us to define one or more functions that should be hosted on the Amazon platform, and trigger those functions – for example through an HTTP endpoint – using AWS SDK. It can also run in response to certain events, including changes to DynamoDB table, a new topic on SNS channel, updates to S3 bucket and some other events.
Such technology allows us to create really slick reactive application architectures, where functions are triggered by a set of events happening on the platform. Add to that theoretically unlimited scalability and a really appealing pricing model, and you get a great platform for backend solutions.
What is API Gateway?
API Gateway, on the other hand, allows us to easily expose Lambda functions through REST APIs. It is a configuration only solution, allowing to us quickly define endpoints, their required parameters and their set of responses. Sort of like a custom nginx solution, where you can define how the traffic to your endpoints should be routed to proper functions. In addition to routing to Lambdas, Gateway can also route to basically any other AWS service, or function as a simple proxy to any HTTP(s) address.
Creating a serverless architecture
The “serverless” bit doesn’t actually mean that there are no servers required to run the new feature. It just means that we don’t have to deal with them because they are abstracted away by AWS Lambda as an internal component. That means we don’t have to worry about scaling, multi-server communication and other problems related to distributed systems. Lambdas do everything for us!
Below you can have see the components diagram of our new Scheduled API service.
Starting from the left, we have a set of APIs that will allow us to communicate with the outside world. This will allow the creation and deletion of new schedules for things like your bulk extract configuration.
Each of the REST methods is a separate Lambda function with the API Gateway configuration defining an endpoint for that function. They are communicating with the DynamoDB table to maintain that set of schedules.
Additionally, the information about the changes to our table, will trigger another Lambda function. That function, which we called “Execution Store Creator”, is responsible for maintaining a sorted set of incoming schedule executions inside the Redis collection (here we are using the ElastiCache version).
The idea is that a quick peek at the top of our collection should tell us which schedules should be fired.
Then, the Execution Store Poller, uses a new feature of the Lambda functions (scheduled execution) to periodically peek at the top of our Execution Store Collection and execute any schedules for which the time has come. Next, we pass the message through the SQS queue to our batch execution nodes, which will start preparing a new set of fresh data.
Managing everything with JAWS
Even though we don’t have to worry about setting up virtual machines or containers to maintain those components, we still have to manage numerous AWS services: S3, Lambda, API Gateway, DynamoDB, Elasticache and SQS to name a few.
JAWS leverages AWS CloudFormation to take care of the creation of all the necessary components and uses its concept of stages to provide us with separate environments. Additionally, it allows our Lambda functions to have custom environment variables, takes care of building deployment packages and provides a wrapper for deploying API Gateway definitions.
We have a separate environment for our production and staging versions, as well as a set of stages that work as feature branches for all developers. That way everyone on the team can have their own playground.
If you’re thinking about adding AWS Lambda function into your stack, definitely have a look at JAWS. It made this task much easier for us.
New Amazon Web Services features open up possibilities to think about backend solutions in a new exciting way. A way with potentially unlimited scalability and no infrastructure work. We are excited to see how this solution will work out in production. Already, we are starting to see benefits from the simplicity of the code base and a minimalistic approach to application development.
The schedule service still needs some time before it will be available for the public use, but we are already super excited with the potential use cases that this brings for our users. At the same time the set of technologies that we use to deliver this feature is quite greenfield, so come back soon if you are interested to hear more about our experiences.
Turn the web into data for free
Create your own datasets in minutes, no coding required
Powerful data extraction platform
Point and click interface
Export your data in any format
Unlimited queries and APIs