Home | Benchmarks | Archives | Atom Feed

Posted on Mon 10 November 2014

Better Python Package Management

Donald Stufft, one of the core contributors to pip, wrote a blog post last year detailing four days of traffic to PyPI's CDN provider. One of the metrics showed that pip is used for 75% of installations using PyPI. The tool has proved so useful and popular that in October 2013 it was agreed pip would be included alongside Python from version 3.4 onward.

I've been served well by pip for my entire time as a Pythonista but occasionally I require functionality that is unavailable in pip itself. Below I explore some solutions to those edge cases.

Running the latest version of pip

Before executing any commands quoted in this post make sure your version of pip is reasonably up-to-date. As of this writing 1.5.4 is the lastest release supported by Ubuntu and 1.5.6 was the latest version on GitHub.

$ pip --version
pip 1.1 ... (python 2.7)

$ sudo apt-get update
$ sudo apt-get install --only-upgrade python-pip
$ pip --version
pip 1.5.4 ... (python 2.7)

If you're running Python 3.4 or higher then this is a non-issue as pip is already packaged with your Python installation.

See all versions available

Search functionality at the package name level is supported in pip but if you want to see all versions of a certain package that are available from PyPI then yolk can be useful:

$ pip install yolk
$ yolk -V django
Django 1.7.1
Django 1.7
Django 1.6.8
Django 1.6.7
...

Update the outdated

Finding outdated packages via pip has been possible for some time:

$ pip list --outdated
setuptools (Current: 2.2 Latest: 7.0)
pip (Current: 1.5.4 Latest: 1.5.6)
ansible (Current: 1.7.1 Latest: 1.7.2)

But if you want to update all outdated packages or pick-and-choose interactively then the pip-review tool included in pip-tools does a good job of this.

For example, I'll install some older versions of Django and Ansible:

$ pip install 'ansible<1.5' 'Django<1.5'
$ pip freeze | grep -i 'django\|ansible'
Django==1.4.16
ansible==1.4.5

With one command I can update them to the latest, stable versions:

$ pip-review --auto
Downloading/unpacking Django==1.7.1
...
  Found existing installation: Django 1.4.16
    Uninstalling Django:
      Successfully uninstalled Django
Successfully installed Django
...
Downloading/unpacking ansible==1.7.2
...
Successfully installed ansible

$ pip freeze | grep -i 'django\|ansible'
Django==1.7.1
ansible==1.7.2

Keep in mind that pip-review only works with packages that following the versioning scheme outlined in PEP 440. A notable package that up until recently didn't follow this convention was pytz. It used a <YEAR><Revision character> format so versions like 2013b looked like a beta version to pip-review instead of the second, stable release of 2013. As of version 2013.6 they've switched to a <YEAR>.<MONTH> scheme that works well with PEP 440 and pip-review.

Cryptographically guaranteeing packages

You can specify version numbers and commit IDs when freezing package requirements with pip but you can't pin package contents to a specific cryptographic hash. If a new version of a package was released to PyPI without a version increment or there was a man-in-the-middle attack between your machine and PyPI you might not know with pip alone.

There is a tool called peep which will hash package contents in your requirements file and generate a new one with hashes above each package. Then if you install with peep again (i.e. on a continuous integration server) every package downloaded will be checked against the hash saved in the requirements file.

$ pip install peep
$ peep install -r requirements.txt
Downloading/unpacking Jinja2==2.7.3
...
Successfully downloaded requests
Cleaning up...

The following packages had no hashes specified in the requirements file, which
leaves them open to tampering. Vet these packages to your satisfaction, then
add these "sha256" lines like so:

# sha256: LiSsXQBNtXFJdqBKwOgMbfbkfpjDVMssDYL4h51Pj9s
Jinja2==2.7.3

# sha256: pOwa_1m5WhS0XrLiN2GgF56YMZ2lp-t2tW6ozce4ccM
MarkupSafe==0.23

# sha256: w2yTiocuX_SUk4szsUqqFWy0OexnVI_Ks1Nbt4sIRug
PyYAML==3.11

# sha256: oXznFrR_gx6pyIvaszdT-jhwx-qKHuXowSh8pAyEmKw
ansible==1.7.2

# sha256: Bx84YUW6eC21bJppfjQAY_rkIrbscuoo4Gsde5WUyVk
dopy==0.3.0

# sha256: jjtsGT-R3JSy87AmHj6rvcYE94_5n9rTJKVv3QtelYw
ecdsa==0.11

# sha256: mgm6flv6VZOHKXqbO8qq1wKZtedoAsXqSCn2h9urojE
paramiko==1.15.1

# sha256: 8s4emJsnLPy2d2FnY-Ci5-xlnv-meoiqkrOmVSj2Cjw
pycrypto==2.6.1

# sha256: EkiQ9BcjyFqoLf4IB0Mq6kbSSusNr840CWnSCJVIwsM
requests==2.4.3

-------------------------------
Not proceeding to installation.

I took the pairs of hashes and packages generated above and placed them back into requirements.txt:

$ cat requirements.txt
# sha256: LiSsXQBNtXFJdqBKwOgMbfbkfpjDVMssDYL4h51Pj9s
Jinja2==2.7.3

# sha256: pOwa_1m5WhS0XrLiN2GgF56YMZ2lp-t2tW6ozce4ccM
MarkupSafe==0.23

# sha256: w2yTiocuX_SUk4szsUqqFWy0OexnVI_Ks1Nbt4sIRug
PyYAML==3.11

# sha256: oXznFrR_gx6pyIvaszdT-jhwx-qKHuXowSh8pAyEmKw
ansible==1.7.2

# sha256: Bx84YUW6eC21bJppfjQAY_rkIrbscuoo4Gsde5WUyVk
dopy==0.3.0

# sha256: jjtsGT-R3JSy87AmHj6rvcYE94_5n9rTJKVv3QtelYw
ecdsa==0.11

# sha256: mgm6flv6VZOHKXqbO8qq1wKZtedoAsXqSCn2h9urojE
paramiko==1.15.1

# sha256: 8s4emJsnLPy2d2FnY-Ci5-xlnv-meoiqkrOmVSj2Cjw
pycrypto==2.6.1

# sha256: EkiQ9BcjyFqoLf4IB0Mq6kbSSusNr840CWnSCJVIwsM
requests==2.4.3

From then on I can just install requirements via peep to ensure I cryptographically use the same package contents each time I download them.

$ peep install -r requirements.txt

Dependency graphs

You can see package dependencies using pip but if you want to see the dependences of those dependences then you need to go through each of them individually:

$ pip show ansible | grep Requires
Requires: paramiko, jinja2, PyYAML, httplib2

$ pip show paramiko | grep Requires
Requires: pycrypto, ecdsa

There is a tool called pipdeptree that will show the whole family tree of dependences in your environment.

$ pip install pipdeptree
$ pipdeptree
...
argparse==1.2.1
wsgiref==0.1.2
peep==1.3
ansible==1.7.1
  - paramiko [installed: 1.15.1]
    - pycrypto [required: >=2.1, installed: 2.6.1]
    - ecdsa [required: >=0.11, installed: 0.11]
  - Jinja2 [installed: 2.7.3]
    - MarkupSafe [installed: 0.23]
  - PyYAML [installed: 3.11]
  - setuptools
  - pycrypto [required: >=2.6, installed: 2.6.1]
dopy==0.3.0
  - requests [required: >=1.0.4, installed: 2.4.3]

This can be useful on projects that have not kept well-maintained requirements.txt files or for separating development from production requirements.

Check for package conflicts

If you want to find dependency conflicts among requirements check-pip-dependencies and pip-conflict-checker are two tools that can help track these down. Both projects have low commit counts and activity on GitHub so it might be worth double-checking their work afterword.

Excluding packages

I had problems with distribute==0.6.4 appearing in requirements.txt files I was working with in 2012 and 2013 that would cause pip to deactivate when it tried to install distribute. Usually blindly running pip freeze > requirements.txt would include it in the requirements.txt files I was working with.

I never found a way to blacklist packages by name with pip so I resorted to the following bash command to avoid installing distribute when installing package requirements:

$ grep -v distribute requirements.txt | xargs pip install

Remove unneeded packages

If you want to remove unused dependencies from your environment pip-autoremove can do the job. The only requirement is that you list the top-level requirements you know you need and it'll remove any orphaned dependencies:

$ pip-autoremove Flask Sphinx -y

Separate production and development requirements

If you wish to keep development dependencies off production systems then splitting your requirements.txt file up into two separate files can be helpful.

But if you then want to run pip freeze to save updated package version pinnings it won't separate out the packages into separate production and development requirements files. There is another tool in pip-tools called pip-dump that will do that.

It looks to match the package names up in files matching the following patterns:

GLOB_PATTERNS = (u'*requirements.txt', u'requirements[_-]*.txt', u'requirements/*.txt')

I've added a few older packages to my requirements files here:

$ cat requirements.txt
Django==1.4.16
ansible==1.4.5
requests==2.4.3

$ cat requirements-dev.txt
coverage==3.7.1
dopy==0.3.0
peep==1.3
pip-tools==0.3.5
pipdeptree==0.4.1
pycrypto==2.6.1

I'll install them, update them to their latest versions and then save them back into their respective requirements files:

$ pip install -r requirements.txt
$ pip install -r requirements-dev.txt
$ pip-review --auto
$ pip-dump

Packages that were in requirements.txt and requirements-dev.txt now have their respective latest version numbers:

$ cat requirements.txt
ansible==1.7.2
Django==1.7.1
ecdsa==0.11
httplib2==0.9
Jinja2==2.7.3
MarkupSafe==0.23
paramiko==1.15.1
PyYAML==3.11
requests==2.4.3

$ cat requirements-dev.txt
coverage==3.7.1
dopy==0.3.0
peep==1.3
pip-tools==0.3.5
pipdeptree==0.4.1
pycrypto==2.6.1

Linting your requirements files

If you want to find possible issues in your requirements files then piplint can help:

$ pip install piplint
$ piplint requirements*

For debugging purposes, the following packages are installed but not in the requirements file(s):
argparse==1.2.1
piplint==0.2.0
wsgiref==0.1.2

Speeding up installations

PyPI uses Fastly as a CDN so downloads from them should be reasonably consistent. That's not to say the installation process can't be sped up. pip-accel will cache source code and binaries compiled during installation so that they won't need to be downloaded/re-compiled if you download the same version again.

$ cat requirements.txt
pytz==2013.6
$ pip-accel install -r requirements.txt
...
... Done! Took 4.94 seconds to install 1 package.
$ pip uninstall pytz
...
  Successfully uninstalled pytz
$ pip-accel install -r requirements.txt
...
... Executing command: pip install \
    --download-cache=/home/mark/.pip/download-cache \
    --find-links=file:///home/mark/.pip-accel/sources \
    --build-directory=/tmp/tmpztEZvg \
    --no-index -r requirements.txt --no-install
...
... Done! Took 1.48 second to install 1 package.
Thank you for taking the time to read this post. I offer consulting, architecture and hands-on development services to clients in North America & Europe. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn.

Copyright © 2014 - 2017 Mark Litwintschik. This site's template is based off a template by Giulio Fidente.