This is the story of how we cut down our AWS costs by 80% in just under two weeks.

AWS is a candy shop for developers

I need to begin with an introduction. We have used AWS since 2018 for all our projects, and it has worked miracles for us. We are a fully distributed team, and having our own data center somewhere in the world would be problematic. It is much easier to rent resources from AWS and skip all the capital expenses.

The problem with AWS is that developers can basically create any resources without having to approve them with our financial department. With traditional data centers, this is not the case — buying an additional server would need getting an invoice from the store and asking the financial department to pay for it.

So, basically, the basis of the problem is that with AWS, the developers can just buy the resources in the amounts they want and when they want.

What did we do to cut AWS costs?

We are not a huge company, and our AWS costs are just a little higher than $7k per month across all AWS accounts. Also, it is worth mentioning that we host only DEV and QA stands, as PROD stands are paid for by our customers. Our resources are mostly individual dev machines, test databases, and various custom resources for research projects such as Kinesis Firehose, Sage Maker, etc. So we have a lot of random resources that are hard to categorize, structure, predict and control.

So, how did we tackle lowering our AWS costs?

First, we started looking into the Cost Explorer and identified the most expensive items:

Second, we started moving everything possible to spot instances. This is a simple procedure. For an individual machine, you need to shut it down, detach the volume (remember to write down the mount path), and then terminate the machine. Then you create a new spot instance (no matter what AMI, just make sure that the CPU architecture is compatible with your previous volume). Once the spot instance is created, detach (and don’t forget to delete!) the new volume and attach the previous volume on the same mount path as it was on the original machine. For Beanstalk environments, it’s simpler — we just changed the capacity settings to utilize only spot instances.

Savings: $1000/month

Third, we cleared unused S3 buckets (we did some auto-trading bots that accumulated a lot of streaming data). And setup auto-removing of data in multiple S3 buckets, so that we don’t store trading data for more than a year as it becomes completely obsolete and unuseful.

Savings: $300/month

Fourth, we shrank some resources. It’s a matter of checking the consumed CPU and RAM, and if we see less than 50% constant use, we lower the tier.

Savings: $300/month (would be 3x more on on-demand instances)

Fifth, we set up auto-shutdown on individual machines. We created multiple lambda functions for different types of tasks: shutdown a SageMaker Jupyter VM after 1 hour of inactivity, shutdown individual VMs, DEV and QA stands for the night period when nobody is working. These lambda functions are run on cloudwatch events daily. There are lambdas to enable DEV and QA stands as well to facilitate the process.

Savings: $500/month

Also, we implemented some smaller solutions for further savings, but they are not covered in this article.

So far, we have saved about $5500 of our $7000 monthly bill, which is around 80% of all costs! I knew that we were overspending on AWS, but never knew that it was THAT much. Over the course of the year, it means about $66,000 in savings.

How do organizations approach cloud costs optimization?

After having our own experience with cloud cost optimization, I understood how important it is to carefully track cloud costs. Basically, cloud cost optimization can save enough to boost the business if you put the saved money into marketing. Or you could take it out as dividends and buy a new car. The sum is great and there are many things that can be done with it.

Since it is out of the question that cloud cost optimization is an absolutely needed endeavor, how do companies approach it? Let’s think about ways of implementing cloud waste management, from the simplest to the most advanced.

1. Buying just virtual machines

You could approach the problem in the most traditional way possible. Deny the countless possibilities provided by AWS and just restrict your developers from buying EC2 machines.

SQS? No. DynamoDB? No. Just use EC2 virtual machines and install everything on them.

Pros:

Cons:

All-in-all, it is not a good strategy to work with the cloud as if you just rent hosting on GoDaddy.

2. Review every request

What if you allow the developers to use and scale any resources, but they have to negotiate them with the special department that controls the costs? The developers do not have their own rights to buy/scale resources, but they can ask a special person to buy/scale a resource for them.

Let’s say a developer needs a Kinesis Firehose endpoint (yes, I mention a service that you most probably have not even heard about). Would it be a simple task for the developer to explain what he/she wants to the controller? And then the developer should also explain the reasoning behind scaling and probably even prove that the architecture choice is good and not wasteful in terms of cost management.

Upon providing a specific example, one could see that it just does not work this way. It could work only if the cost management team consists of experts.

And that’s just the tip of the iceberg. Now consider:

Pros:

Cons:

3. Hire a FinOps team

A more advanced way would be to actually find and hire experts in AWS that would control the spending. They can use the tools that AWS provides to control spending out of the box. It has:

These tools are not user-friendly and require well-educated personnel that knows what to do with them. However, you can actually start controlling your cloud costs. This approach requires not only tools and highly skilled workers but also a framework in which the team works: periodic check-ups of underutilized resources, shrink&clean procedures, and others.

A team that is basically DevOps with a financial-conscious approach is called FinOps.

Pros:

Cons:

4. Use cloud waste management software

Once you think seriously about hiring (or growing your own) FinOps team, you should also consider a 3rd party cloud cost optimization software, such as Infinops. It is your automatic FinOps team member that works 24/7 and is not susceptible to human error. Such software automatically controls your cloud for underused resources and other known ways of saving, such as:

All those tips come automatically, as your system is constantly scanned for changes. And such advice can save you up to 80% of your monthly bill. This usually means saving at least tens of thousands of dollars over the course of the year.

Pros:

Cons:

In conclusion, I'd like to say that managing AWS costs can be tricky. The company's 80% savings show it's possible to spend less with the right moves. Whether you're setting limits on resources, getting approvals, using expert teams, or automated tools, it's essential to keep a close eye on expenses. After all, using the cloud should be about making things easier, not pricier.