Most Django performance discussions focus on the object-relational mapper (ORM). This is because the performance profile of typical applications is dominated by the efficiency of the queries they make. For example, a common performance issue in Django applications is the presence of N + 1 queries, which can be eliminated using select_related and prefetch_related.

Inefficient queries are usually easy to fix once identified, but how do you find them? A given query’s performance depends on many factors. For example:

Correct usage of database indices
Use of prefetch_related and select_related to avoid running many seperate individual queries
Load on the database from other sources (such as Celery tasks)

You can monitor your database for performance issues, but that only tells you that they exist. If you see your database CPU spiking, how can you track down what’s happening in your application that might be the root cause? As your application grows and your team expands, it can become difficult to get a handle on the entire code base. How do you get a reasonable performance profile of a large application without having to manually audit every part of your code?

Observability is Everything

I’m a strong believer that you only know what can be seen, at least with respect to web services. It is absolutely critical to emit metrics from your application to make sure everything is working as expected. At Rover we use statsd to emit application health metrics and DataDog to aggregate and monitor them.

Metrics are not just for sanity checking your application and alarming on emergent issues. Metrics tell the story of exactly what is happening in your application, at whatever level of granularity you want. We can make use of our existing metrics infrastructure to help us understand the performance profile of our application with respect to what’s going on in the database.

We want to emit query metrics to characterize the performance of every request that is served by our application. Specifically, it would be very useful to know how many queries are executed by each request, and how long those queries take. How can we do it? The ORM abstracts the execution of individual queries away, so we need somewhere in the code that’s more granular. We want to hook into every query that gets emitted so we can instrument it with metrics.

Extending the ORM With a Custom Database Engine

Unfortunately Django does not provide any signals before/after query execution. We’ll need to dig into the ORM to find the exact point where queries are executed. Rover is backed by MySQL, so this is inside the MySQL database engine.

Django actually lets you specify your own database engine. You just need to implement a class called DatabaseWrapper that lives in a base.py module, and point your database configuration’s ENGINE to the directory containing this module. Let’s see what this might look like.

Engine Implementation

Take a look at the implementation for Django’s builtin database engine for MySQL. It might look scary, but remember that we just want to surround the query execution with metrics. Django’s implementation uses a CursorWrapper class to implement the SQL execution. When the ORM wants to get a database cursor to execute queries, it calls the create_cursor method of the DatabaseWrapper class. So we’ll just need to extend the CursorWrapper to emit metrics after queries get executed, and extend the DatabaseWrapper class to give the ORM our special CursorWrapper.

Here’s an example implementation:


import time

from datadog import statsd

from django.db.backends.mysql.base import CursorWrapper
from django.db.backends.mysql.base import (
    DatabaseWrapper as DjangoDatabaseWrapper
)


class CustomCursorWrapper(CursorWrapper):
    def execute(self, query, args=None):
        start = time.time()
        result = super(CustomCursorWrapper, self).execute(query, args)
        end = time.time()
        statsd.histogram('myapp.query_time', end - start)
        return result


class DatabaseWrapper(DjangoDatabaseWrapper):
    def create_cursor(self, name=None):
        cursor = self.connection.cursor()
        return CustomCursorWrapper(cursor)

We subclass the builtin CursorWrapper to time the queries, execute them by calling the superclass’s execute method, and emit a statsd histogram with the time it took to execute that query. We then implement the DatabaseWrapper such that it overrides the builtin class to return our custom cursor wrapper.

You can install the custom engine by referencing it in your DATABASES setting. Let’s say you put the above code in a file located at myapp/databases/mysql/base.py. Your database configuration would look something like:


DATABASES = {
    'default': {
        'ENGINE': 'myapp.databases.mysql',
        ...
    }
}

Making the Metrics Useful

Recall that our goal was to figure out what parts of our code are executing what kinds of queries. The myapp.query_time metric is not very helpful, it just gives us the time of every query that gets run. Let’s add some useful tags. We can start with a simple one – the MySQL verb. This will let us break down our query volume by reads and writes.

We add to our implementation above to include the verb:


def execute(self, query, args=None):
    start = time.time()
    verb = query[:query.find(' ')]
    result = super(CustomCursorWrapper, self).execute(query, args)
    end = time.time()
    statsd.histogram(
        'myapp.query_time',
        end - start,
        tags=['verb:{}'.format(verb)]
    )
    return result

This code will efficiently look up the first word in the query, which is the MySQL verb (SELECT, UPDATE, INSERT, etc) and tag the myapp.query_time metric by the verb.

Making the Metrics More Useful

We now have a breakdown of query times across the different verbs. But we still want to tie this back to our application. Which requests are causing these queries to be executed? We need a way to propagate that information down into our custom database engine so we can tag our metric with it.

Unfortunately Django does not provide a request-local proxy such as Flask’s request, so you’ll have to implement this yourself. One easy way to do it is with a threading.local object that is initialized at the beginning of each request and cleared at the end. A great place to do this is with a custom middleware class. For example, you could implement the following in myapp/middleware/request_data.py:


import inspect

from threading import local

request_data = local()


class RequestDataMiddleware(object):
    def __init__(self, get_response)
        self.get_response = get_response

    def __call__(self, request):
        response = self.get_response(request)
        request_data.view_name = None
        return response

    def process_view(self, request, view_func, view_args, view_kwargs):
        module = inspect.getmodule(view_func).__name__
        request_data.view_name = '{}.{}'.format(module, view_func.__name__)

Before a view is executed, the fully qualified view_name (e.g. path.to.views.MyView) will be stored in the request_data thread local, and cleared when the request completes. We can now access the view name inside our database engine code and tag the query with it:


...
from myapp.middleware.request_data import request_data
...

def execute(self, query, args=None):
    start = time.time()
    verb = query[:query.find(' ')]
    result = super(CustomCursorWrapper, self).execute(query, args)
    end = time.time()
    statsd.histogram(
        'myapp.query_time',
        end - start,
        tags=[
            'verb:{}'.format(verb),
            'view_name:{}'.format(request_data.view_name)
        ]
    )
    return result

Now we can build dashboards to look at the slowest queries by verb, broken down by which view is executing them. That’s great! It gives us quite a bit of visibility into our application’s query performance, and lets us monitor the performance of new views that are added without any additional effort by whomever is developing those new views.

And Even More Useful

There is a subtle issue with the myapp.query_time histogram. Recall that we wanted to know how many queries were executed and how long those queries took per request. Right now we are emitting the histogram metric for every single query, but this is a more granular metric than we want. What we really need to do is aggregate the queries that run during a request/response cycle and emit them once, as a single metric, at the end of the request.

There are a lot of ways to approach this. At Rover we opted to use a class to “collect” metrics. This class is instantiated at the beginning of a request and flushes its metrics when the request completes (successfully or not). It lives as a singleton throughout the request/response cycle (one object per request). The internal implementation details of this object are out of scope for this blog post, but they’re fairly simple; we basically track the total number of queries and net query time for each verb with a dictionary, and emit a separate histogram for each verb at the end of the request.

The interesting part is how the “collector” gets used. Imagine we’ve implemented a MetricsCollector class in myapp/metrics/collector.py. We should augment our middleware to initialize and store a MetricsCollector object inside our thread local, so it’s accessible from anywhere in the code base:


import inspect

from threading import local

from myapp.metrics.collector import MetricsCollector

request_data = local()


class RequestDataMiddleware(object):
    def __init__(self, get_response)
        self.get_response = get_response

    def __call__(self, request):
        request_data.collector = MetricsCollector()
        response = self.get_response(request)
        request_data.collector.flush_metrics()
        request_data.view_name = None
        request_data.collector = None
        return response

    def process_view(self, request, view_func, view_args, view_kwargs):
        module = inspect.getmodule(view_func).__name__
        request_data.view_name = '{}.{}'.format(module, view_func.__name__)

The implementation of flush_metrics takes care of emitting the correct statsd metrics after the request completes. We’ll change our database engine to use this collector:


def execute(self, query, args=None):
    start = time.time()
    verb = query[:query.find(' ')]
    result = super(CustomCursorWrapper, self).execute(query, args)
    end = time.time()
    request_data.collector.record_query(verb, end - start)
    return result

Instead of emitting the metric directly in the code, we record the queries using the collector’s record_query method, which just stores a query count and net query time for each verb. At the end of the request, we’ll emit a single counter of the number of queries executed per verb and a histogram of the total query time per verb for that request.

Once these metrics are flowing, you’ll be able to break down the query count and total query time of individual requests for all your views. This enables incredibly deep insight into your application’s performance. Maybe you have a view that isn’t called often, but causes thousands of database queries. Or maybe you have a high-traffic view that doesn’t run many queries, but there’s a particular slow query that’s missing an index. At Rover these query metrics have enabled us to identify views with problematic queries and track down and fix the offenders. In one case we identified a view that ran as many as 600 queries per request and were able to reduce it to fewer than 60. You can even monitor and alarm on query count/query time to catch performance regressions before they cause systemic impact to your infrastructure!

Pitfalls

This level of observability is very powerful, but there are some things to be aware of.

There is no guarantee Django will support custom database engines in future versions, since this feature is technically undocumented. Be aware of this before depending too heavily on it.
Every query will go through your execute method. Make sure it is performant! Don’t do anything that takes a long time in this method.
Use caution with thread locals. They must be properly instantiated and cleared, and you need to be sure they will work in your deployment environment. Be aware of the implications if your request handling code spawns additional threads.
If you want to use this approach in production, think carefully about how to instantiate/reset the MetricsCollector. What happens if you run a query before your RequestDataMiddleware is evaluated? Your query code should work even if no MetricsCollector has been setup yet.

Extending Even Further

We execute code in our Django application in two additional contexts beyond request handling: Celery tasks and management commands. If you run a lot of asynchronous tasks, extending your query metrics to support them is a must. I’ll elide the implementation details for brevity, but the idea is to abstract thread-local storage from execution context. Instead of using request_data, you can encapsulate the concept “local data” in a way that will work in any execution context (in a request, task, or management command). Then, think about how to setup/tear down your MetricsCollector in each different execution context – before/after a task run, and before/after a management command execution.

Final Words

We’ve described an approach to query performance observability in Django that is complete, thorough, and powerful. It will give you significant and often surprising insight into how your application performs with respect to your database. Once you’ve profiled your application this way, you can set up monitors to proactively identify performance problems before you encounter them as an emergent issue at scale. At Rover we’ve used these metrics to identify N + 1 query problems in our views, slow queries in runaway tasks, and cron jobs unnecessarily taxing our database. These metrics continue to drive our performance and scalability improvements and give us deep omniscience into how our application behaves.

If this kind of work interests you, my team is hiring!