Tracking which wheels can be reproducibly built

Being able to reproducibly build binary artifacts means that users, developers, and others can agree that the shipped artifact was correctly built from the source code (that one can inspect), and no intentional or unintentional malicious code was introduced during the build process.

One hiccup we’ve encountered in SecureDrop development is that not all Python wheels can be built reproducibly. We ship multiple (Python) projects in debian packages, with Python dependencies included in those packages as wheels. In order for our debian packages to be reproducible, we need that wheel build process to also be reproducible. That wheel process is reproducible (as of pip wheel 0.27.0 - see relevant issue) if you set SOURCE_DATE_EPOCH to be a constant value. However, there are still sources of nondeterminism for some projects.

For our purposes, this has resulted in our building the wheels (once), saving those wheels on a pip mirror, and then using those wheels at debian package build time. A few times, we’ve asked “wait, which wheels can’t be reproducibly built again?”. So I made a little tracker on https://reproduciblewheels.com/ for convenience.

EDIT: As of August 19, 2020, passing a static --build directory to the pip wheel command below means that all currently tracked wheels are reproducible 🎉.

How it works

I first selected the 100 most popular packages on PyPI in the past year plus any dependencies that are on FPF’s pip mirror[1].

Then, I have a little function that builds the wheel twice and then compares the SHA256 hash to determine if they are the same. The build command is:

 1  python3 -m pip wheel --no-binary :all: --no-cache-dir 

Here --no-binary :all: is used to ensure that I download the source tarball and --no-cache-dir is used so that I don’t inadvertently use a cached built artifact.

A friendly bot is running the above build function nightly for every monitored project, saving the results as JSON, and then updating the static HTML, which is deployed when it’s committed to the main branch via GitHub pages. That’s it!

If you find issues or think it should be tracking something else, just open an issue.

1. I should note here that from this set I excluded a few (7) projects that either required additional build requirements that I didn’t have out of the box in the build environment or had some other build-time issue (for the interested, this is ticket #2 on the bugtracker).