Effective Engineering in the Cloud: Using AWS for Product Growth

Our cloud-based backup service happily runs on AWS. As our customer base has scaled up, we continuously re-evaluate if we are still getting out of AWS what we expect. We send a lot of money to Amazon each month, and there are plenty of blogs recommending that once you hit a certain scale, it is more efficient to self-host.

In a series of articles to come over the course of this month, I’ll detail much of how we use AWS and how well it has helped us evolve and grow our business with a focus on being responsive to our business needs, rather than trying to spend the least amount on our product hosting.  This material comes from an Austin Web Architecture talk I recently gave with a subtitle of “How I Learned to Stop Worrying and Love AWS.”

Ultimately, as a startup company, we are not looking to optimize our hosting spend; instead, we are looking to build on an infrastructure that allows us to easily try new things and that will not lock us into long-term decisions. While we believe that we can predict the future (look at any startup’s pitch deck for the hockey stick revenue projection), experience has shown that we will change our mind a lot, always based on learning more about the problems we are trying to solve.

[Note: For the TL:DR crowd, you can stop here. The remaining articles in this series (along with all other posts you will find in the Cloud Technology category) are intended to provide an in-depth view into the technical side of Spanning’s development and engineering – so don’t try to read this at the next red light.]

Here at Spanning we are fond of saying, “Even simple things are hard at at scale.” AWS is an instrumental part of our technical approach. We like the ability to pay for just what we use, and given our automation investment, standing up a many-node server cluster for a couple of hours to run a test costs us just a few dollars. This allows us to run multiple tests and refine them over and over in an effort to quickly converge on an answer.

OK, let’s get to some early details. I’ll start with a quick description of how we deploy into AWS, then in follow-up posts, I’ll dive into the details of these tiers of our deployment topology:

  • Web
  • DB
  • Worker
  • Storage
  • Search

Our Backup for Google Workspace product is conceptually pretty simple (but remember what I previously mentioned about simplicity and scale). We run a set of workers that connect to Google and pull down our users’ content, then store it in AWS. Our Web App has a view into this storage, so that users can browse their content and initiate restore or export jobs.

Hang on, because I’m about to play a serious game of AWS buzzword bingo: We deploy on EC2 instances running Ubuntu. These EC2 instances all have different roles for backup, restore, export, search, www, etc., and each role runs in its own AutoScale group that manages how many workers we have running based on a set of CloudWatch metrics. We use IAM to ensure that these worker boxes communicate securely with each other. We use SQS to feed work to all of our worker instances. All of our backups are written to S3, and we use RDS running MySQL (yep, MySQL) for our database tier.

Over the course of the next few blog posts I’ll dive deep into each of these layers and describe how we use them in more detail and some of the benefit/tradeoff decisions that we made to end up where we are. This is a work in progress so I’m sure that by the time I finish this series, some of what I’ve described in the early posts will have changed.  That is the beauty of AWS!