Your queries are slow, but memory/cpu usage is low – part II

On the previous article Your queries are slow, but memory/cpu usage is low I mentioned that one of the reasons where you can observe slowness in an app on AWS that is using RDS is IOPS budgeting. The main symptom you’ve already gathered from the title and that’s why you are reading this: slow queries, but you can actually see whether the issue is IOPS budget, not because AWS made it easier to keep track of it (they haven’t), but because the effect is visible in the metrics: you’ll notice a flatline in the maximum Read/Write IOPS.

To see the metrics:

  • Go to https://console.aws.amazon.com/rds/ .
  • Click instances in the left side bar.
  • Click the instance you’re having issues with.
  • Under cloudwatch there’s a search box, write IOPS there.

If you’re seeing a flatline at a number that is 3 times the storage capacity (eg: for a 10GB hard disk (EBS volume), that would be 30 IOPS) then you’ve run out of budget. If you see a bunch of peaks that are under and over that maximum, you have budget to spare.

If you see the flatline of slowness, you have two choices:

  1. If you just want an increase to less than 1000 IOPS, you can increase the storage you are using. You probably won’t need the space, but you’re just doing it for the IOPS. If you have a multi-AZ RDS instance the downtime will be minimal (under a minute in my latest experience).
  2. If you want more than 1000 IOPS then you can add Provisioned IOPS to your instance:
  • On the same screen as previously, click instance actions on the top right corner.
  • Under Storage type, choose: Provisioned IOPS (SSD).
  • Now a new option will appear for choosing how much capacity you want to provision:

Screenshot showing the Provisioned IOPS field under Storage Type on the AWS console

Note: Remember that these changes will make you database slower during the conversion, so please do it at a time where your users don’t need it.

Want to get python/AWS tips on your email? Just subscribe!

Changing function arguments and have tests pass because the function was mocked

Recently I changed a function to receive a single argument instead of the previous two. Unfortunately that resulted in another function failing later. The issue wasn’t lack of tests (that function had 100% coverage), but the way this function was tested.

@patch('module.function_changed')
def test_that_function_was_called(mock_function):
    task()
    mock_function.assert_called_with('arg1', 'arg2')

One of the downsides of using mock objects in tests is that you lose the connection to the original object. So the code above runs happily and the test passes because mock_function is a simple Mock object. But we don’t need to accept this, a simple change will make the Mock object validate that the arguments are valid for the original functional:

@patch('module.function_changed', autospec=True)
def test_that_function_was_called(mock_function):
    task()
    mock_function.assert_called_with('arg1', 'arg2')

By adding autospec=True to the patch call, the arguments are validated against the original function signature, and in this case the call would raise a TypeError: function_changed() takes 1 positional argument but 2 were given and the test would fail. And now you’d know that something was wrong well before it gets deployed…

Want to get python/AWS tips on your email? Just subscribe!

Your queries are slow, but memory/cpu usage is low – part I

You’re noticing timeouts on queries running on your PostgreSQL database? Or just slower response to queries? And when you look at RDS monitoring the cpu and memory look like the machine is doing nothing?

There are two main causes I’ve found for this: lock contention and IOPS limits. Today I’m talking about the most likely of them: lock contention.

So, first things first, how to identify if you have lock contention? There are two queries that will help you identify what’s going on:

SELECT COUNT(*) FROM pg_locks

This will show you how many locks are in your system at a point in time. If the database is not being used, the number should be zero. If it’s being used, I’d expect the locks to go up and sometimes fall down to zero (we’re talking about low load scenarios). If that doesn’t happen (or rarely happens) you might have slow transactions helding locks for a long time and making everything else wait for them.

You can confirm if that’s the case with this query:

SELECT * FROM pg_stat_activity WHERE state <> 'idle'

You’ll be looking at some states of idle in transaction and/or with wait_type lock.

If you see these signs, there are 2 strategies you can use to speed things up:

  • Optimise the queries that are causing the locks. The quickest return will probably come up from adding some indexes, I’d recommend talking a look at PostgresSQL’s wiki index usage query for where you’ll get the biggest return.
  • Reduce locks in your application (usually be eliminating queries). Remember those less than clear queries you were thinking of cleaning up another day? It might be the right time.

And that’s how you can make your app faster.

Want to get python/AWS tips on your email? Just subscribe!

Provided role cannot be assumed by principal ‘events.amazonaws.com’.

You did a zappa deploy and it failed with An error occurred (ValidationException) when calling the PutRule operation: Provided role <your lambda role> cannot be assumed by principal 'events.amazonaws.com'.?

You tried to create a lambda with a new handmade role only to be greeted by this cryptic error message. Or you tried to use an already existing role with lambda.

Translating the message: it means you haven’t authorized the events (events.amazonaws.com) service to assume the role, so lambdas can’t use it. So, how do we add that authorization?

  • Go to https://console.aws.amazon.com/iam/
  • Click roles on the left.
  • Click the role you want to use for lambda.
  • Click the tab trust relationships.
  • Click the button Edit trust relationship.
  • If this lambda is only to be used by lambda, you can just replace the policy by:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "",
                "Effect": "Allow",
                "Principal": {
                    "Service": [
                        "apigateway.amazonaws.com",
                        "lambda.amazonaws.com",
                        "events.amazonaws.com"
                    ]
                },
                "Action": "sts:AssumeRole"
            }
        ]
    }
    
  • If not, just make sure you have events.amazonaws.com as a Service in the Statement that allows to AssumeRole:

            {
              "Sid": "",
              "Effect": "Allow",
              "Principal": {
                "Service": [
                  "apigateway.amazonaws.com",
                  "lambda.amazonaws.com",
                  "events.amazonaws.com"
                ]
              },
              "Action": "sts:AssumeRole"
            }
    
  • Click Update trust policy.

In the end you should see something like this:

Trust relationships for lambda

Want to get python/AWS tips on your email? Just subscribe!

The role defined for the function cannot be assumed by Lambda

You did a zappa deploy and it failed with InvalidParameterValueException: An error occurred (InvalidParameterValueException) when calling the CreateFunction operation: The role defined for the function cannot be assumed by Lambda?

You tried to create a lambda with a new handmade role only to be greeted by this cryptic error message. Or you tried to use an already existing role with lambda.

Translating the message: it means you haven’t authorized the lambda service to assume the role, so lambdas can’t use it. So, how do we add that authorization?

  • Go to https://console.aws.amazon.com/iam/
  • Click roles on the left.
  • Click the role you want to use for lambda.
  • Click the tab trust relationships.
  • Click the button Edit trust relationship.
  • If this lambda is only to be used by lambda, you can just replace the policy by:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "apigateway.amazonaws.com",
                    "lambda.amazonaws.com",
                    "events.amazonaws.com"
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
  • If not, just make sure you add to the Statement list the statement:
        {
          "Sid": "",
          "Effect": "Allow",
          "Principal": {
            "Service": [
              "apigateway.amazonaws.com",
              "lambda.amazonaws.com",
              "events.amazonaws.com"
            ]
          },
          "Action": "sts:AssumeRole"
        }

  • Click Update trust policy.

In the end you should see something like this:

Trust relationships for lambda

Want to get python/AWS tips on your email? Just subscribe!

zappa-sentry, automatic integration of zappa and sentry

Want to know when something goes wrong in a lambda? Tired of replicating all your alarm setup for each lambda?

If you just want a simple setup on each lambda that guarantees that you get alerts with enough information to be actionable, zappa-sentry is for you.

You’ll need a sentry project DSN. If you don’t have one or don’t even have an account you can create one for free at https://sentry.io/

How to use?

First, install the zappa-sentry in your lambda’s virtual environment with: pip install zappa-sentry (if you’re using a requirements.txt to manage dependencies don’t forget to add zappa-sentry to it).

Next, setup your sentry DSN as the value of environment variable SENTRY_DSN, either on the zappa_setting.json file or in any of the other methods on https://github.com/miserlou/zappa/#setting-environment-variables

Then you can setup the zappa_sentry.unhandled_exceptions handler.

Example:

{
    "dev": {
        ...
        "environment_variables": {
            "SENTRY_DSN": "https://*key*:*pass*@sentry.io/*project*",
            ...
        },
        "exception_handler": "zappa_sentry.unhandled_exceptions",
        ...
    },
    ...
}

And that’s all. Deploy your zappa function and you should see any errors appearing on sentry.

Want to get python/AWS tips on your email? Just subscribe!

Deleting old items in SQLAlchemy

Looking to delete old entries from a table because they’ve expired? Want to do it in an elegant way?

I usually like to split this kind of functionality into two different parts: the method that does the deleting and a static method that can be invoked from cron, a celery scheduled task or a django command.

As an example, let’s say we want to delete all the log entries on a system that are over 181 days (6 months) old.

Assuming a model like:

import datetime

class LogEntry(db.Model):
    __tablename__ = 'log_entries'

    id = db.Column(db.Integer, primary_key=True)
    text = db.Column(db.String(80))
    timestamp = db.Column(db.DateTime, default=datetime.datetime.utcnow)

First we add a method on the model that deletes expired log entries.

    @classmethod
    def delete_expired(cls):
        expiration_days = 181
        limit = datetime.datetime.now() - datetime.timedelta(days=expiration_days)
        cls.query.filter(cls.timestamp <= limit).delete()
        db.session.commit()

You’ll notice the use of @classmethod, that’s needed so we can invoke from the class and not from an object, as I’m doing in the next function (the one that can be called from a celery scheduled task, for instance):

def delete_expired_logs():
    LogEntry.delete_expired()

And with this you keep it elegant: all the relevant model relevant information in the model class, so if someone changes the timestamp field to another name, they will only have to change it in the delete_expired method, but you can easily call from somewhere else like a task or command.

Want to get python/AWS tips on your email? Just subscribe!