Crushing, caching and CDN deployment in Django

Caching pages, or even portions of pages is one of the easier ways to speed up a website's performance. Minifying markup can help save space in memory-based data stores. Using a CDN means your users can potentially download the content closer to where they are.

In this post I'll walk through performing all the above in a Django project.

Dependencies and boilerplate

The principle orchestrating package being used for crushing is django-compressor. It's a ~5-year-old project that has had 233 closed tickets and 106 contributors. It uses a lot of other packages to crush various scripts (hence the long requirements list below).

Other packages that I'll use include django-redis-cache for cache storage communications and for HTML minification I'll use django-htmlmin.

The STATIC_DEPS=true prefix below is for lxml.

$ sudo apt install \
    libxml2-dev \
    libxslt-dev \
    python-dev \
    libzip-dev \
    redis-server

$ STATIC_DEPS=true pip install \
    Django==1.7.1 \
    django-compressor==1.4 \
    django-appconf==0.6 \
    BeautifulSoup==3.2.1 \
    html5lib==0.999 \
    slimit==0.8.1 \
    lxml==3.4.0 \
    django-redis-cache==0.13.0 \
    hiredis==0.1.5 \
    django-htmlmin==0.7.0

As of this writing Ubuntu's repositories finally have the latest, stable version of redis, 2.8.17 in this case:

$ redis-server --version
Redis server v=2.8.17 sha=00000000:0 malloc=jemalloc-3.6.0 bits=64 build=64186bb5bffe2061

I've created a new project called 'compressed', created an app called 'coupon' within it, setup the boilerplate database and finally, downloaded Twitter Bootstrap and an example JPEG to an external folder. This folder will be one of the static folder's sources of content.

$ django-admin startproject compressed
$ cd compressed/
$ django-admin startapp coupon
$ mkdir templates static external
$ python manage.py syncdb --noinput

$ curl -O https://github.com/twbs/bootstrap/releases/download/v3.3.1/bootstrap-3.3.1-dist.zip
$ unzip bootstrap-3.3.1-dist.zip
$ mv dist/* external/
$ rm -r bootstrap-3.3.1-dist.zip dist

$ mkdir external/img
$ curl -o external/img/mark.jpg \
    http://tech.marksblogg.com/theme/images/mark.jpg
$ touch templates/base.html \
        external/css/app.css \
        external/js/app.js

I've also created external/css/app.css and external/js/app.js for some small, project-specific code. The file and folder layout of this project now looks like this:

$ tree
.
├── compressed
│   ├── __init__.py
│   ├── settings.py
│   ├── urls.py
│   └── wsgi.py
├── coupon
│   ├── admin.py
│   ├── __init__.py
│   ├── migrations
│   │   └── __init__.py
│   ├── models.py
│   ├── tests.py
│   └── views.py
├── db.sqlite3
├── external
│   ├── css
│   │   ├── app.css
│   │   ├── bootstrap.css
│   │   ├── bootstrap.css.map
│   │   ├── bootstrap.min.css
│   │   ├── bootstrap-theme.css
│   │   ├── bootstrap-theme.css.map
│   │   └── bootstrap-theme.min.css
│   ├── fonts
│   │   ├── glyphicons-halflings-regular.eot
│   │   ├── glyphicons-halflings-regular.svg
│   │   ├── glyphicons-halflings-regular.ttf
│   │   └── glyphicons-halflings-regular.woff
│   ├── img
│   │   └── mark.jpg
│   └── js
│       ├── app.js
│       ├── bootstrap.js
│       ├── bootstrap.min.js
│       └── npm.js
├── manage.py
├── static
└── templates
    └── base.html

The codebase

In compressed/urls.py I mapped all URLs to a single view and added in a static content endpoint for times when the code is run in debug mode. When running on production I'll change STATIC_URL to a CDN endpoint stub.

from django.conf import settings
from django.conf.urls import patterns, include, url
from django.conf.urls.static import static

from coupon import views


urlpatterns = patterns('',
    url(r'^$', views.home, name='index'),
)

if settings.DEBUG:
    urlpatterns += static(settings.STATIC_URL,
                          document_root=settings.STATIC_ROOT)

I created a unit test for the single view. I didn't patch the cache file generation as I'll use the test to generate the cache files on a continuous integration server. With those files generated, I'll deploy them to a CDN.

A cleaner way to approach this would be to patch the unit tests so they don't create file artefacts and use a management command to generate the static cache files.

coupon/tests.py:

from django.core.urlresolvers import reverse
from django.test import TestCase
from django.test.client import Client


class ViewTest(TestCase):

    def setUp(self):
        self.client = Client()

    def test_index(self):
        resp = self.client.get(reverse('index'))
        self.assertEqual(resp.status_code, 200)

The single view in coupon/views.py renders a template and returns it.

from django.shortcuts import render


def home(request):
    return render(request, 'base.html')

I created a templates/base.html file which is mostly an example file from the bootstrap project. I added some template tags for identifying sections of markup to minify and wrapped asset URLs with a static template tag so their path prefix can be controlled via settings.STATIC_URL.

{% load staticfiles %}
{% load compress %}

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>Bootstrap 101 Template</title>

    {% compress css %}
    <link href="{% static "css/bootstrap.min.css" %}" rel="stylesheet">
    <link href="{% static "css/app.css" %}" rel="stylesheet">
    {% endcompress %}

    <!--[if lt IE 9]>
      <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
      <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
  </head>
  <body>
    <h1>Hello, world!</h1>
    <img src="{% static "img/mark.jpg" %}" />

    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
    {% compress js %}
    <script src="{% static "js/bootstrap.min.js" %}"></script>
    <script src="{% static "js/app.js" %}"></script>
    {% endcompress %}
  </body>
</html>

The compressed/settings.py file is the longest and probably most complicated of the project. Here is a summary of the key settings being put in place:

Fetch the SECRET_KEY environment variable.

Add compressor to the installed apps tuple.

Add caching and minification middleware.

Use redis as a caching backend (using database #3).

Fetch the STATIC_URL environment variable and fall-back to using /static/ as the default value if the environment variable is unavailable.

Define the minification behaviours, key among them: KEEP_COMMENTS_ON_MINIFYING which will keep HTML comments in place so we can use conditional statements and COMPRESS_CSS_HASHING_METHOD which will cause our cache file names to be a hash of their respective contents.

import os


BASE_DIR = os.path.dirname(os.path.dirname(__file__))

SECRET_KEY = os.environ.get('SECRET_KEY', None)
assert SECRET_KEY is not None, \
    'SECRET_KEY environment variable is needed'

DEBUG = True
TEMPLATE_DEBUG = True

INSTALLED_APPS = (
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'compressor',
)

MIDDLEWARE_CLASSES = (
    'django.middleware.cache.UpdateCacheMiddleware',
    'htmlmin.middleware.HtmlMinifyMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.contrib.auth.middleware.SessionAuthenticationMiddleware',
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',
    'django.middleware.cache.FetchFromCacheMiddleware',
    'htmlmin.middleware.MarkRequestMiddleware',
)

ROOT_URLCONF = 'compressed.urls'
WSGI_APPLICATION = 'compressed.wsgi.application'

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
    }
}

CACHES = {
    'default': {
        'BACKEND': 'redis_cache.RedisCache',
        'LOCATION': '127.0.0.1:6379',
        'OPTIONS': {
            'DB': 3,
            'PARSER_CLASS': 'redis.connection.HiredisParser',
            'CONNECTION_POOL_CLASS': 'redis.BlockingConnectionPool',
            'CONNECTION_POOL_CLASS_KWARGS': {
                'max_connections': 25,
                'timeout': 4,
            }
        },
    },
}

LANGUAGE_CODE = 'en-us'
TIME_ZONE = 'UTC'
USE_I18N = True
USE_L10N = True
USE_TZ = True

STATIC_URL = os.environ.get('STATIC_URL', '/static/')
assert STATIC_URL is not None and len(STATIC_URL), \
    'STATIC_URL environment variable is needed'

STATICFILES_DIRS = (
    os.path.join(BASE_DIR, "external"),
)

STATIC_ROOT = os.path.join(BASE_DIR, "static")

STATICFILES_FINDERS = (
    'django.contrib.staticfiles.finders.FileSystemFinder',
    'django.contrib.staticfiles.finders.AppDirectoriesFinder',
    'compressor.finders.CompressorFinder',
)

TEMPLATE_DIRS = [os.path.join(BASE_DIR, 'templates')]

COMPRESS_ENABLED = True
COMPRESS_CSS_HASHING_METHOD = 'content'
COMPRESS_CSS_FILTERS = [
    'compressor.filters.css_default.CssAbsoluteFilter',
    'compressor.filters.cssmin.CSSMinFilter',
]

HTML_MINIFY = True
KEEP_COMMENTS_ON_MINIFYING = True

The caching system is setup to work on a per-view basis but template fragments are also supported in Django and can be useful on pages with dynamic content.

Finally, to help demonstrate asset concatenation, I've added the following frontend files:

external/css/app.css:

h1 {
    display: none;
    text-align: center;
}

external/js/app.js:

$( document ).ready(function() {
    $('h1').fadeIn({'duration': 2000})
});

Initialising Django

When setting environment variables I like to use read. It will allow you to type or paste in values for environment variables. This means those values won't appear when you run history.

$ read SECRET_KEY
0x$01e345$)v74-!k9gfdzhmybw0=-0g+1jh@v#8-=s+^_
$ export SECRET_KEY

$ history | tail
...
 1320  read SECRET_KEY
 1321  export SECRET_KEY

With the settings in place I've run collectstatic which will take static files from our external folder and any applications (such as Django's admin) and place them in the static folder.

$ python manage.py collectstatic --noinput
Copying '/home/mark/compressed/external/img/mark.jpg'
Copying '/home/mark/compressed/external/css/bootstrap.min.css'
...
Copying '/home/mark/compressed/external/fonts/glyphicons-halflings-regular.svg'

16 static files copied to '/home/mark/compressed/static'.

Making a web request

I started up the reference WSGI server, cleaned redis' database #3 and fetched the project's homepage:

$ python manage.py runserver &
$ redis-cli -n 3 flushdb
OK

$ curl --silent localhost:8000 | fold -w50
<!DOCTYPE html><html lang="en"><head><meta charset
="utf-8"/><meta content="IE=edge" http-equiv="X-UA
-Compatible"/><meta content="width=device-width, i
nitial-scale=1" name="viewport"/><title>Bootstrap
101 Template</title><link href="/static/CACHE/css/
1afa57f03e30.css" rel="stylesheet" type="text/css"
/><!--[if lt IE 9]><script src="https://oss.maxcdn
.com/html5shiv/3.7.2/html5shiv.min.js"></script> <
script src="https://oss.maxcdn.com/respond/1.4.2/r
espond.min.js"></script><![endif]--></head><body><
h1>Hello, world!</h1><img src="/static/img/mark.jp
g"/><script src="https://ajax.googleapis.com/ajax/
libs/jquery/1.11.1/jquery.min.js"></script><script
 src="/static/CACHE/js/2c7f836f5f1d.js" type="text
/javascript"></script></body></html>

As you can see, there is no excess white space in the HTML and the two CSS files are now concatenated together into /static/CACHE/css/ 1afa57f03e30.css.

If I run the response through gzip it shows the content is only 440 bytes, about half of its original size.

$ curl --silent localhost:8000 | gzip | wc -c
440

I ran the monitor command on redis and made a subsequent request for the homepage. I could see the caches for the headers and body fetched by Django:

$ redis-cli -n 3 monitor
OK
1415882400.531810 [3 127.0.0.1:43471] "GET" ":1:views.decorators.cache.cache_header..8f95444a9dd16027d8bb2b1e8ed2fb75.en-us.UTC"
1415882400.532644 [3 127.0.0.1:43471] "GET" ":1:views.decorators.cache.cache_page..GET.8f95444a9dd16027d8bb2b1e8ed2fb75.d41d8cd98f00b204e9800998ecf8427e.en-us.UTC"

Embedding assets within HTML

If you're running a single page application then external assets might not be faster than embedding them within the HTML. If multiple pages aren't being loaded then the web request to fetch each asset can mean more overhead. django-compressor supports an inline flag in its template tag which will load the assets (if they're on the local system) and place them inline.

{% compress css inline %}
<link href="{% static "css/bootstrap.min.css" %}" rel="stylesheet">
<link href="{% static "css/app.css" %}" rel="stylesheet">
{% endcompress %}

Now the contents of css/bootstrap.min.css and css/app.css will be embedded in the page.

$ curl --silent localhost:8000 | fold -w50 | head -n20
<!DOCTYPE html><html lang="en"><head><meta charset
="utf-8"/><meta content="IE=edge" http-equiv="X-UA
-Compatible"/><meta content="width=device-width, i
nitial-scale=1" name="viewport"/><title>Bootstrap
101 Template</title><style type="text/css">/*!* Bo
otstrap v3.3.1(http://getbootstrap.com) * Copyrigh
t 2011-2014 Twitter,Inc. * Licensed under MIT(http
s://github.com/twbs/bootstrap/blob/master/LICENSE)
 *//*!normalize.css v3.0.2 | MIT License | git.io/
normalize */html{font-family:sans-serif;-webkit-te
xt-size-adjust:100%;-ms-text-size-adjust:100%}body
{margin:0}article,aside,details,figcaption,figure,
footer,header,hgroup,main,menu,nav,section,summary
{display:block}audio,canvas,progress,video{display
:inline-block;vertical-align:baseline}audio:not([c
ontrols]){display:none;height:0}[hidden],template{
display:none}a{background-color:transparent}a:acti
ve,a:hover{outline:0}abbr[title]{border-bottom:1px
 dotted}b,strong{font-weight:700}dfn{font-style:it
alic}h1{margin:.67em 0;font-size:2em}mark{color:#0

The weight of the page totals 28,424 bytes when piped through gzip.

$ curl --silent localhost:8000 | gzip | wc -c
28424

Batch image compression

I try to always crush images before committing them to a code base rather than leave them for a request process or continuous integration server to handle. It's a CPU-intensive process that only needs to be run once and few compression tools are smart enough to know ahead of time if they'll make a significant dent in the file size or not.

I have a script I use which relies on mozjpeg, gifsicle and optipng to crush JPEG, GIF and PNG-formatted images within an given folder and child folders as well.

Below are some condensed installation commands. As of this writing Mozilla doesn't provide a binary for mozjpeg so I've found a community-built one instead:

$ sudo apt install \
    optipng \
    gifsicle

$ curl -O http://mozjpeg.codelove.de/bin/libmozjpeg_2.1_amd64.deb
$ sudo dpkg -i libmozjpeg_2.1_amd64.deb
$ sudo ln -s /opt/libmozjpeg/bin/jpegtran /usr/bin/mozjpeg

The image crushing script itself:

find . -iname "*.jpg" -type f \
    -exec mozjpeg -copy none -optimize -outfile {} {} \;
find . -iname '*.gif' -type f \
    -exec gifsicle -O2 -b {} \;
find . -iname '*.png' -type f \
    -exec optipng -o9 -q {} \;

Deploying static contents to a CDN

The django-compressor package supports working with django-storages which can upload your static contents onto various cloud platforms. Personally I'd rather this step was taken by a continuous integration system.

If django-storages isn't included it means fewer packages to rely on. Even if django-storages' collectstatic command was only run on a continuous integration server it's still a package with its own dependencies that needs to be kept in mind by developers and tested that it still works when changes are made.

By relying on just changing the STATIC_URL environment variable the code base is then not only cloud-agnostic but cloud-ignorant.

If a rollback was needed the continuous integration server could simply compile the assets and deploy again leaving less for a developer to do. If new buckets and distributions are created during each deployment then the addresses from previous deployments could be used as settings in the rolled-back code base's environment variables.

Keep in mind this is only useful for projects that don't push user uploads onto cloud storage. If this code base were doing so then django-storages would be a necessity.

Before I describe the strategy I'll point out that s3cmd version 1.1.0-beta3, which is distributed via Ubuntu's repositories has an issue in communicating with Amazon Cloudfront.

$ s3cmd --version
s3cmd version 1.1.0-beta3

$ s3cmd cfcreate s3://deployment-dfshk33/
.ERROR: S3 error: 400 (InvalidOrigin): The specified origin server does not exist or is not valid.

This issue has been fixed in later versions of s3cmd so I've removed the older version from my system and installed a newer version in my virtual environment (since s3cmd is written in Python). I've also installed shortuuid which will also be used in this deployment strategy.

$ sudo apt remove s3cmd
$ source ~/.bashrc
$ pip install python-dateutil==2.2 \
              shortuuid==0.4.2 \
              https://github.com/s3tools/s3cmd/archive/v1.5.0-rc1.zip#egg=s3cmd
$ s3cmd --version
s3cmd version 1.5.0-rc1

On the continuous integration server, after the code base has been checked out and requirements installed, I'll run the unit tests. This both tests the code and causes the cache files to generate:

$ python manage.py test
$ python manage.py collectstatic
$ tree static/CACHE
static/CACHE
├── css
│   └── 7aea4b813422.css
└── js
    └── 772c8931bdf3.js

I will then generate a short UUID that will suffix my Amazon S3 bucket name:

In [1]: import shortuuid

In [2]: shortuuid.ShortUUID().random(length=6)
Out[2]: '8VQUpK'

I will then create a bucket using that suffix and sync my static contents into it:

$ s3cmd mb s3://deployment-8VQUpK
$ s3cmd sync static/ s3://deployment-8VQUpK \
  --acl-public --guess-mime-type

I will then create a Cloudfront distribution for the above S3 bucket and wait for it to deploy (it took around five minutes when I did this):

$ s3cmd cfcreate s3://deployment-8VQUpK/
Distribution created:
Origin:         s3://deployment-8VQUpK/
DistId:         cf://E19HG43AR6N7ME
DomainName:     d2r3jcgbupqrcn.cloudfront.net
CNAMEs:
Comment:        http://deployment-8VQUpK.s3.amazonaws.com/
Status:         InProgress
Enabled:        True
DefaultRootObject: None
Etag:           E39EDVTTHIFEWU

$ s3cmd cfinfo cf://E19HG43AR6N7ME | grep -oP 'Status\: .*'
Status:         InProgress

Waiting, waiting, waiting...

$ s3cmd cfinfo cf://E19HG43AR6N7ME | grep -oP 'Status\: .*'
Status:         Deployed

Once that bucket and distribution are working code can be deployed to production servers with the STATIC_URL environment variable set to the relative Cloudfront URL stub. This means all URLs that use Cloudfront now have a new host name and will break older caches.

On the production server(s):

$ ... install code base ...
$ STATIC_URL=//d2r3jcgbupqrcn.cloudfront.net/
$ export STATIC_URL
$ redis-cli -n 3 flushdb
$ ... restart supervisor ...

The above could be wrapped up into a Fabric script that records the bucket, distribution and git commit details.

...
0c744b6, deployment-8VQUpK, d2r3jcgbupqrcn.cloudfront.net, E19HG43AR6N7ME
366439f, deployment-97Q3s2, feabcefghujsds.cloudfront.net, F22EE44AA66711

Keeping the buckets and distributions around for a while means a rollback would only need to be code deployed to production with STATIC_URL set to the Cloudfront URL stub of that commit ID. There would be no need to setup another bucket and distribution again.

There could be a clean up script that removes older buckets and distributions once they're past their use as a roll back.

Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn.