Lambda that is called regularly like a cron task

Do you want to run a task/function regularly, but don’t want to pay an EC2 instance just to run cron? Or don’t want to set it up and manage the instance?

You can use AWS lambdas (and zappa) for that. As an example, let’s create and deploy a function that prints Hello World! in the logs once a minute.

First install zappa: pip install zappa.

Second, build the function that will be called and save it on a file called event_test.py:

def process_event(event, context):
    print('Hello World!')

Then create the zappa_settings.json file:

{
    "event_hello": {
       "project_name": "event-hello",
       "runtime": "python3.6",
       "events": [{
           "function": "event_test.process_event", // The function to execute
           "expression": "rate(1 minute)"          // When to execute it (in cron or rate format)
       }],
       "apigateway_enabled": false,                // We don't need a web access to the lambda
       "s3_bucket": "<CHANGE_THIS>"                // bucket for deploy
    }
}

Then deploy it and enable the events with:

$ zappa deploy event_hello
Calling deploy for stage event_hello..
Creating event-hello-event-hello-ZappaLambdaExecutionRole IAM Role..
Creating zappa-permissions policy on event-hello-event-hello-ZappaLambdaExecutionRole IAM Role.
Downloading and installing dependencies..
 - sqlite==python36: Using precompiled lambda package
Packaging project as zip.
Uploading event-hello-event-hello-1540029016.zip (5.0MiB)..
100%|█████████████████████████████████████████████████████████████████████████████████| 5.24M/5.24M [00:02<00:00, 1.60MB/s]
Scheduling..
Scheduled event-hello-event-hello-event_test.process_event with expression rate(1 minute)!
Scheduled 25a24ee2d25a447b827d55fd774ecf9a586a4-handler.keep_warm_callback with expression rate(4 minutes)!
Deployment complete!

And now you can see that every minute you get a new ‘Hello World!’ in the logs – you can check with $ zappa tail.

Note: the keep_warm reference is to a default zappa event. You can check what it does in the README.

If you want to stop the events but keep the lambda deployed, do:

$ zappa unschedule event_hello
Calling unschedule for stage event_hello..
Unscheduling..
Unscheduled 25a24ee2d25a447b827d55fd774ecf9a586a4-handler.keep_warm_callback.
Unscheduled event-hello-event-hello-event_test.process_event.

And to reenable them, do:

$ zappa schedule event_hello
Calling schedule for stage event_hello..
Scheduling..
Scheduled event-hello-event-hello-event_test.process_event with expression rate(1 minute)!
Scheduled 25a24ee2d25a447b827d55fd774ecf9a586a4-handler.keep_warm_callback with expression rate(4 minutes)!

Have fun!

Want to get python/AWS tips on your email? Just subscribe!

Where to get help for Zappa?

Stuck with zappa? Assuming you’re looking for help about the python lambda library and not the singer, then the following should help.

The main documentation is the project’s README: https://github.com/Miserlou/Zappa/blob/master/README.md – check the links section for particular integrations (ex: django, flask, alexa skills).

If you think you found a bug, raise it in github issues: https://github.com/Miserlou/Zappa/issues. If you already have a fix, create a pull-request: https://github.com/Miserlou/Zappa/pulls.

If you want a new feature (or actually have implemented it), also raise it in github issues: https://github.com/Miserlou/Zappa/issues. If you already have the code, create a pull-request: https://github.com/Miserlou/Zappa/pulls.

If you are looking for support/discussions/answers to doubts, the best resource is the slack channel: https://slack.zappa.io/ – if you’re looking for paid support, ask in the #support channel, otherwise #general or #aws are usually fine.

Hope this gets you unstuck!

Want to get python/AWS tips on your email? Just subscribe!

Managing python dependencies in a reliable way

Are you running your development environment with different packages than production? And your tests? Or even different parts of production?

The common advice (example) is to pin versions of packages in a requirements.txt file (that means defining which particular version works for your program). That advice fails because the most common way to pin your dependencies is installing the dependencies on a virtual environment and do a pip freeze > requirements.txt. The problems with this are multiple:

  1. The environment might have broken dependencies. pip install only checks the dependency requirements for the last package installed, so that’s the only one guaranteed to work.
  2. That environment will have the direct dependencies and, recursively, the dependencies of dependencies. It will be difficult to identify what your app actually requires in the future (particularly when checking for version conflicts on dependencies of dependencies).

This is not the only solution for this, you can use pipconflictchecker to deal with the first problem, or you can use pipenv, but I’ve been well served with pip-tools for years, so I thought I’d share my workflow and how it solves the problems above.

First you need to install pip-tools (preferably inside your app’s virtual environment): pip install pip-tools

Then you need to create the file requirements.in, which has mostly the same format as a requirements.txt file. It will contain the packages and specific version your app requires. Usually I avoid setting versions in it, unless there’s a known issue with a specific package (and in that case the previous line will a commment and a link to the issue). This is the file you’re going to be creating/updating, all dependencies here should be used directly by your app.

The next step is “compiling” the requirements with pip-compile. This command will fetch the dependency information for each package and calculate a compatible set of dependencies that respects all the dependencies of all packages and their dependencies. The result is saved on a file called requirements.txt. You’ll notice that, for dependencies of dependencies, pip-compile generates comments explaining which packages are requiring it.

Once you have the requirements.txt file you can install it with pip install -r requirements.txt but I prefer pip-tools’ pip-sync which not only installs the packages but also uninstalls any package that is not on the requirements.txt file.

I also include the development/test dependencies in the requirements.in. That’s the only way to guarantee that the dependency set on requirements.txt has the same versions that are tested. It’s possible to pick up a requirements-dev.txt subset of requirements.txt and do a pip uninstall -r requirements-dev.txt before deploying to production, but even I’ll admit I only do that for more complex projects.

With this environment I can use then the following cheatsheet:

Action Instructions
Add a package as a dependency. Add the package name in requirements.in. Run pip-compile.
Remove a dependency the app no longer uses. Remove the package name from requirements.in. Run pip-compile.
Upgrade dependencies to the current versions. Run pip-compile --upgrade. (1)
Update the local environment. Run pip-sync.
Deploy Run pip-sync on the server/docker image build.

(1) I usually use this to create pull-requests to run the tests and QA the changes weekly.

Want to get python/AWS tips on your email? Just subscribe!

Zappa, the packager

Are you having problems with packaging your dependencies for a Lambda? You’ve done the AWS lambda python tutorials but now need to deal with something more complex? Do you just want to build a package to upload in the console or with terraform?

Zappa is a great tool to handle everything from publishing your code to have it integrated with other AWS services with the minimum modifications possible. That results in sometimes ignoring that you can use for some smaller roles, like just packaging a lambda in a zip file that you can then use on the console and/or terraform.

So, let’s imagine you have your code in a file called myfunction.py doing something like:

def lambda_handler(event, context):
    print('Hello World!')

To package it do the following:

1) Create a virtual environment for your function: python3 -m venv ve for python 3.6.

2) Activate the virtual environment: . ve/bin/activate (on Mac OS X or GNU/Linux) or ve/Scripts/Activate (on Windows). 3) Install zappa and other dependencies you need with pip install <dependency>. 4) Create a zappa_settings.json with the following content:

{
    "dev": {
        "lambda_handler": "myfunction.lambda_handler",
        "runtime": "python3.6"
     }
}

5) Run zappa package dev.

6) And that’s it, with that zappa has created a zip file that you can now upload to AWS.

Want to get python/AWS tips on your email? Just subscribe!

Your queries are slow, but memory/cpu usage is low – part II

On the previous article Your queries are slow, but memory/cpu usage is low I mentioned that one of the reasons where you can observe slowness in an app on AWS that is using RDS is IOPS budgeting. The main symptom you’ve already gathered from the title and that’s why you are reading this: slow queries, but you can actually see whether the issue is IOPS budget, not because AWS made it easier to keep track of it (they haven’t), but because the effect is visible in the metrics: you’ll notice a flatline in the maximum Read/Write IOPS.

To see the metrics:

  • Go to https://console.aws.amazon.com/rds/ .
  • Click instances in the left side bar.
  • Click the instance you’re having issues with.
  • Under cloudwatch there’s a search box, write IOPS there.

If you’re seeing a flatline at a number that is 3 times the storage capacity (eg: for a 10GB hard disk (EBS volume), that would be 30 IOPS) then you’ve run out of budget. If you see a bunch of peaks that are under and over that maximum, you have budget to spare.

If you see the flatline of slowness, you have two choices:

  1. If you just want an increase to less than 1000 IOPS, you can increase the storage you are using. You probably won’t need the space, but you’re just doing it for the IOPS. If you have a multi-AZ RDS instance the downtime will be minimal (under a minute in my latest experience).
  2. If you want more than 1000 IOPS then you can add Provisioned IOPS to your instance:
  • On the same screen as previously, click instance actions on the top right corner.
  • Under Storage type, choose: Provisioned IOPS (SSD).
  • Now a new option will appear for choosing how much capacity you want to provision:

Screenshot showing the Provisioned IOPS field under Storage Type on the AWS console

Note: Remember that these changes will make you database slower during the conversion, so please do it at a time where your users don’t need it.

Want to get python/AWS tips on your email? Just subscribe!

Changing function arguments and have tests pass because the function was mocked

Recently I changed a function to receive a single argument instead of the previous two. Unfortunately that resulted in another function failing later. The issue wasn’t lack of tests (that function had 100% coverage), but the way this function was tested.

@patch('module.function_changed')
def test_that_function_was_called(mock_function):
    task()
    mock_function.assert_called_with('arg1', 'arg2')

One of the downsides of using mock objects in tests is that you lose the connection to the original object. So the code above runs happily and the test passes because mock_function is a simple Mock object. But we don’t need to accept this, a simple change will make the Mock object validate that the arguments are valid for the original functional:

@patch('module.function_changed', autospec=True)
def test_that_function_was_called(mock_function):
    task()
    mock_function.assert_called_with('arg1', 'arg2')

By adding autospec=True to the patch call, the arguments are validated against the original function signature, and in this case the call would raise a TypeError: function_changed() takes 1 positional argument but 2 were given and the test would fail. And now you’d know that something was wrong well before it gets deployed…

Want to get python/AWS tips on your email? Just subscribe!

Your queries are slow, but memory/cpu usage is low – part I

You’re noticing timeouts on queries running on your PostgreSQL database? Or just slower response to queries? And when you look at RDS monitoring the cpu and memory look like the machine is doing nothing?

There are two main causes I’ve found for this: lock contention and IOPS limits. Today I’m talking about the most likely of them: lock contention.

So, first things first, how to identify if you have lock contention? There are two queries that will help you identify what’s going on:

SELECT COUNT(*) FROM pg_locks

This will show you how many locks are in your system at a point in time. If the database is not being used, the number should be zero. If it’s being used, I’d expect the locks to go up and sometimes fall down to zero (we’re talking about low load scenarios). If that doesn’t happen (or rarely happens) you might have slow transactions helding locks for a long time and making everything else wait for them.

You can confirm if that’s the case with this query:

SELECT * FROM pg_stat_activity WHERE state <> 'idle'

You’ll be looking at some states of idle in transaction and/or with wait_type lock.

If you see these signs, there are 2 strategies you can use to speed things up:

  • Optimise the queries that are causing the locks. The quickest return will probably come up from adding some indexes, I’d recommend talking a look at PostgresSQL’s wiki index usage query for where you’ll get the biggest return.
  • Reduce locks in your application (usually be eliminating queries). Remember those less than clear queries you were thinking of cleaning up another day? It might be the right time.

And that’s how you can make your app faster.

Want to get python/AWS tips on your email? Just subscribe!

Provided role cannot be assumed by principal ‘events.amazonaws.com’.

You did a zappa deploy and it failed with An error occurred (ValidationException) when calling the PutRule operation: Provided role <your lambda role> cannot be assumed by principal 'events.amazonaws.com'.?

You tried to create a lambda with a new handmade role only to be greeted by this cryptic error message. Or you tried to use an already existing role with lambda.

Translating the message: it means you haven’t authorized the events (events.amazonaws.com) service to assume the role, so lambdas can’t use it. So, how do we add that authorization?

  • Go to https://console.aws.amazon.com/iam/
  • Click roles on the left.
  • Click the role you want to use for lambda.
  • Click the tab trust relationships.
  • Click the button Edit trust relationship.
  • If this lambda is only to be used by lambda, you can just replace the policy by:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "",
                "Effect": "Allow",
                "Principal": {
                    "Service": [
                        "apigateway.amazonaws.com",
                        "lambda.amazonaws.com",
                        "events.amazonaws.com"
                    ]
                },
                "Action": "sts:AssumeRole"
            }
        ]
    }
    
  • If not, just make sure you have events.amazonaws.com as a Service in the Statement that allows to AssumeRole:

            {
              "Sid": "",
              "Effect": "Allow",
              "Principal": {
                "Service": [
                  "apigateway.amazonaws.com",
                  "lambda.amazonaws.com",
                  "events.amazonaws.com"
                ]
              },
              "Action": "sts:AssumeRole"
            }
    
  • Click Update trust policy.

In the end you should see something like this:

Trust relationships for lambda

Want to get python/AWS tips on your email? Just subscribe!

The role defined for the function cannot be assumed by Lambda

You did a zappa deploy and it failed with InvalidParameterValueException: An error occurred (InvalidParameterValueException) when calling the CreateFunction operation: The role defined for the function cannot be assumed by Lambda?

You tried to create a lambda with a new handmade role only to be greeted by this cryptic error message. Or you tried to use an already existing role with lambda.

Translating the message: it means you haven’t authorized the lambda service to assume the role, so lambdas can’t use it. So, how do we add that authorization?

  • Go to https://console.aws.amazon.com/iam/
  • Click roles on the left.
  • Click the role you want to use for lambda.
  • Click the tab trust relationships.
  • Click the button Edit trust relationship.
  • If this lambda is only to be used by lambda, you can just replace the policy by:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "apigateway.amazonaws.com",
                    "lambda.amazonaws.com",
                    "events.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
  • If not, just make sure you add to the Statement list the statement:
        {
          "Sid": "",
          "Effect": "Allow",
          "Principal": {
            "Service": [
              "apigateway.amazonaws.com",
              "lambda.amazonaws.com",
              "events.amazonaws.com"
            ]
          },
          "Action": "sts:AssumeRole"
        }

  • Click Update trust policy.

In the end you should see something like this:

Trust relationships for lambda

Want to get python/AWS tips on your email? Just subscribe!

zappa-sentry, automatic integration of zappa and sentry

Want to know when something goes wrong in a lambda? Tired of replicating all your alarm setup for each lambda?

If you just want a simple setup on each lambda that guarantees that you get alerts with enough information to be actionable, zappa-sentry is for you.

You’ll need a sentry project DSN. If you don’t have one or don’t even have an account you can create one for free at https://sentry.io/

How to use?

First, install the zappa-sentry in your lambda’s virtual environment with: pip install zappa-sentry (if you’re using a requirements.txt to manage dependencies don’t forget to add zappa-sentry to it).

Next, setup your sentry DSN as the value of environment variable SENTRY_DSN, either on the zappa_setting.json file or in any of the other methods on https://github.com/miserlou/zappa/#setting-environment-variables

Then you can setup the zappa_sentry.unhandled_exceptions handler.

Example:

{
    "dev": {
        ...
        "environment_variables": {
            "SENTRY_DSN": "https://*key*:*pass*@sentry.io/*project*",
            ...
        },
        "exception_handler": "zappa_sentry.unhandled_exceptions",
        ...
    },
    ...
}

And that’s all. Deploy your zappa function and you should see any errors appearing on sentry.

Want to get python/AWS tips on your email? Just subscribe!