In a pre­vi­ous post we dis­cussed how to auto-sub­scribe a Cloud­Watch Log Group to a Lamb­da func­tion using Cloud­Watch Events. So that we don’t need a man­u­al process to ensure all Lamb­da logs would go to our log aggre­ga­tion ser­vice.

Whilst this is use­ful in its own right, it only scratch­es the sur­face of what we can do. Cloud­Trail and Cloud­Watch Events makes it easy to auto­mate many day-to-day oper­a­tional steps. With the help of Lamb­da of course ;-)

I work with API Gate­way and Lamb­da heav­i­ly. When­ev­er you cre­ate a new API, or make changes, there are sev­er­al things you need to do:

Because these are man­u­al steps, they often get missed.

Have you ever for­got­ten to update the dash­board after adding a new end­point to your API? And did you also remem­ber to set up a p99 laten­cy alarm on this new end­point? How about alarms on the no. of 4XX or 5xx errors?

Most teams I have dealt with have some con­ven­tions around these, but without a way to enforce them. The result is that the con­ven­tion is applied in patch­es and can­not be relied upon. I find this approach doesn’t scale with the size of the team.

It works when you’re a small team. Every­one has a shared under­stand­ing, and the nec­es­sary dis­ci­pline to fol­low the con­ven­tion. When the team gets big­ger, you need automa­tion to help enforce these con­ven­tions.

For­tu­nate­ly, we can auto­mate away these man­u­al steps using the same pattern. In the Mon­i­tor­ing unit of my course Pro­duc­tion-Ready Server­less, I demon­strat­ed how you can do this in 3 sim­ple steps:

If you use the Server­less frame­work, then you might have a func­tion that looks like this:

Cou­ple of things to note from the code above:

The cap­tured event looks like this:

We can find the restApiId and stageName inside the detail.requestParameters attribute. That’s all we need to fig­ure out what end­points are there, and so what alarms we need to cre­ate.

Inside the han­dler func­tion, which you can find here, we per­form a few steps:

Now, every time I cre­ate a new API, I will have Cloud­Watch Alarms to alert me when the 99 per­centile laten­cy for an end­point goes over 1 sec­ond, for 5 minutes in a row.

All this, with just a few lines of code :-)

You can take this fur­ther, and have oth­er Lamb­da func­tions to:

So there you have it, a use­ful pat­tern for automat­ing away man­u­al ops tasks!

And before you even have to ask, yes I’m aware of this server­less plu­g­in by the ACloudGu­ru folks. It looks neat, but it’s ulti­mate­ly still some­thing the developer has to remem­ber to do.

That requires dis­ci­pline.

My expe­ri­ence tells me that you can­not rely on dis­ci­pline, ever. Which is why, I pre­fer to have a plat­form in place that will gen­er­ate these alarms instead.

Hi, my name is Yan Cui. I’m an AWS Serverless Hero and the author of Production-Ready Serverless. I have run production workload at scale in AWS for nearly 10 years and I have been an architect or principal engineer with a variety of industries ranging from banking, e-commerce, sports streaming to mobile gaming. I currently work as an independent consultant focused on AWS and serverless.

You can contact me via Email, Twitter and LinkedIn.

Check out my new course, Complete Guide to AWS Step Functions.

In this course, we’ll cover everything you need to know to use AWS Step Functions service effectively. Including basic concepts, HTTP and event triggers, activities, design patterns and best practices.

Get your copy here.

Come learn about operational BEST PRACTICES for AWS Lambda: CI/CD, testing & debugging functions locally, logging, monitoring, distributed tracing, canary deployments, config management, authentication & authorization, VPC, security, error handling, and more.

You can also get 40% off the face price with the code ytcui.

Get your copy here.