Passwords in Django

Django has a contributed module called auth. This module supports a large amount of User account and authentication functionality which can be used in Django projects.

In this blog post I'll examine the auth module's raw storage of hashed passwords, password hash upgrading and remote DOS attacks using large passwords.

Start with a small Django project

I've created a small Django project for this blog post:

$ pip install Django==1.7.3

$ django-admin startproject passwords

The file and folder layout of this project looks like this:

$ tree
.
├── manage.py
└── passwords
    ├── __init__.py
    ├── settings.py
    ├── urls.py
    └── wsgi.py

The django.contrib.auth module is included in settings.INSTALLED_APPS by default when generating a Django boilerplate project. When I run the migrate command, Django will create the database tables for the auth module:

$ python manage.py migrate
Operations to perform:
  Apply all migrations: admin, contenttypes, auth, sessions
Running migrations:
  ...
  Applying auth.0001_initial... OK
  ...

Tokenised Password Storage

I'll create an example user and examine the raw password field:

$ python manage.py shell

>>> from django.contrib.auth.models import User

>>> user = User(username="tester")
>>> user.set_password('testing')
>>> user.save()

>>> user.password
u'pbkdf2_sha256$15000$Pjun1TMGEQnM$lShdzU33covbDNiqGVDffdHh/86VaECJlaaNXchT0ew='

When using the set_password method the first (and presumably the strongest) password hasher that Django is configured with will be used. In the case of Django 1.7.3 its PBKDF2 + HMAC + SHA256 with 15,000 hashing iterations performed on the original password string.

The raw password field has 4 parts to it, the hashing algorithm, the iteration count, the salt and finally the password hash itself. These are delimited by a dollar sign:

>>> from pprint import pprint
>>> pprint(user.password.split('$'))
[u'pbkdf2_sha256',
 u'15000',
 u'Pjun1TMGEQnM',
 u'lShdzU33covbDNiqGVDffdHh/86VaECJlaaNXchT0ew=']

Django Upgrades Passwords

One of the features of the auth module I like the most is password hash upgrading. When a user successfully authenticates, if their password hash is using any other hashing algorithm or iteration count than the first hashing algorithm configured in Django then Django will take the raw password and re-hash it using the first configured hasher.

If you were to create a password in Django 1.7.3 that was hashed with anything other than PBKDF2 and with the default iteration count then the password hash will be upgraded to PBKDF2 and 15,000 iterations.

Here I've created an account with the password 'testing' and stored it as an unsalted MD5 hash:

>>> from django.contrib.auth.hashers import make_password

>>> user.password = make_password(password='testing',
                                  salt=None,
                                  hasher='unsalted_md5')
>>> user.save()

>>> user.password
'ae2b1fca515949e5d54fb22b8ed95575'

As soon as I authenticate the account the password is re-hashed with PBKDF2:

>>> from django.contrib.auth import authenticate

>>> authenticate(username='tester', password='testing')
>>> user = User.objects.get(username='tester')

>>> user.password
u'pbkdf2_sha256$15000$lPSA3r6AwELv$/6Frb75xtX5xmA8Ezcnl0UxPmHpUaeleY+QqM/dMRLw='

As of version 1.7.3, Django supports nine hashing types and variations which is handy for importing user accounts and their password hashes from other systems.

>>> from django.contrib.auth import hashers
>>> from pprint import pprint

>>> pprint(hashers.HASHERS.keys())
[u'bcrypt_sha256',
 u'sha1',
 u'pbkdf2_sha256',
 u'pbkdf2_sha1',
 u'crypt',
 u'unsalted_md5',
 u'unsalted_sha1',
 u'bcrypt',
 u'md5']

Downgrades as well as Upgrades

The only downside to password hash 'upgrading' is that if you do use PBKDF2 but use a higher iteration count than what is hard-coded into the PBKDF2PasswordHasher class in django.contrib.auth.hashers then the iteration count will be downgraded to that given hard-coded value.

Django 1.7.3 set this iteration count to 15,000, Django 1.8 sets it to 20,000 and the current master branch of Django which targets the version 1.9 release of Django set the iteration count to 24,000.

Here is a demonstration of a downgrade using Django 1.7.3. I've hashed a password 50,000 times and it's downgraded to 15,000 after authentication:

>>> from django.contrib.auth import authenticate
>>> from django.contrib.auth.hashers import PBKDF2PasswordHasher

>>> hasher = PBKDF2PasswordHasher()
>>> user.password = hasher.encode(password='password',
                                  salt='salt',
                                  iterations=50000)
>>> user.save()

>>> user = authenticate(username='tester', password='password')

>>> user.password
u'pbkdf2_sha256$15000$NdqimFkxkuIe$YXO6x1A4XlVaFyu6V+Y/pXHnwpmNAcyFeX88R4JXf1k='

Why is computationally expensive hashing so important?

If the raw password hash values, salts and iteration counts themselves were to ever be exposed it extremely computationally expensive to try and guess what the original password that was used to create the hash... or it would be if the passwords were hashed with PBKDF2, HMAC, SHA256 and tens of thousands of iterations. If you use a weak hashing algorithm, such as unsalted MD5, then it becomes trivial to find out what the original passwords are.

To demonstrate, I'll create an unsalted MD5 hash for the string 'elephant123':

>>> from django.contrib.auth.hashers import make_password

>>> make_password(password='elephant123',
                  salt=None,
                  hasher='unsalted_md5')
'e68a95aadb0c73dfd968513174de4ddf'

If I paste e68a95aadb0c73dfd968513174de4ddf into the form on md5crack it tells me in less than a second that 'elephant123' is a possible string that the hash e68a95aadb0c73dfd968513174de4ddf represents.

MD5 has demonstrated attacks, SHA-1 hashes could be broken with a large amount of computing power but SHA256 (which Django auth's PBKDF2 class is configured to work with) has no known complete attacks. And just in case, having a configurable iteration count means password hashes can be upgraded as computers become more powerful.

Why are some passwords too long for Django?

Part of the security of strong hashing algorithms is the amount of computing power needed to create a hash in the first place.

For example, when you attempt to log into Django admin, the password you supply is hashed using the same hashing algorithm as the account you're logging into to see if the hashes match. If you supply a 5 character long password then the amount of computing resources needed to create the hash will be a lot less than if you supply a 8,000 character long password.

For this reason Django limits the maximum length of a password to 4,096 characters. This was a result of the remote DOS attack discloser in 2013 where attackers could overwhelm a Django installation by trying to authenticate with large passwords.

In my opinion, even a 4,096 character limit is a bit long. To give an idea of the computing resources a 4,096 character password itself could consume considering the following where I hash a 4,096 character password 500 times.

>>> from timeit import timeit

>>> setup = '''
from django.contrib.auth.hashers import PBKDF2PasswordHasher
hasher = PBKDF2PasswordHasher()
password = "a" * 4096
salt = "salt"
'''

>>> timeit(stmt='''hasher.encode(password=password,
                                 salt=salt,
                                 iterations=24000)''',
           setup=setup,
           number=500)
25.06037712097168

It takes 25 seconds to finish hashing on an Intel Core i5 4670 processor running at 3.4GHz on Ubuntu 14.

Imagine 500 simultaneous connections to a Django admin page each trying to authenticate with a bogus 4,096 character password. If it took ~25 seconds to process all of those requests then there wouldn't be any resources available for any other requests during that time period.

This attack wouldn't require a large number of machines either, a single box could perform the 500 requests if the victim's setup didn't rate limit individual IP addresses.

Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn.