Python Development with Docker

Posted by Mike Birdgeneau

Category: Data

I've been trying to brush up my skills in Python after spending a lot of time in R. One of the challenges I've run into is maintaining a common development environment on my Mac. Sometimes I work in Python v2, other times, v3. I can use environments to maintain a set of packages, but then I move to another machine, and some of the libraries are finicky to install. The bottom line is that I'd rather be spending this time developing & learning, rather than fiddling with my local environment.

This is where Docker comes in. Effectively, it allows me to spin up a virtual machine based on a scripted (version controlled) configuration so I can reproduce the same environment easily and consistently on my local machine, across various operating systems, as well as any production environment. Easy!

I've set-up a simple example using Jupyterlab on Docker, which you can view on github: jupyterlab-docker

Docker Components

There are a couple key components:

  1. docker-compose.yml The main configuration file for my docker set-up.
  2. jupyterlab/Dockerfile The container configuration for the jupyterlab machine.
  3. jupyterlab/requirements.txt An easy way to add additional python libraries as needed!

It's probably easiest to explore these through the Github repository, but here's a quick overview of each:

docker-compose.yml

The docker-compose.yml file allows for inclusion of multiple docker containers that run together with networking and data sharing. This allows for us to create a folder called data that will persist after we shutdown the instance.

This also allows us to add additional containers in the future, which can talk to each other -- for example, we could spin up a container with a database instance to work with the jupyterlab notebooks.

Dockerfile

The Dockerfile is the main image configuration, we start with a base Python image using the small 'alpine linux' distribution, and then add the dependencies we need to build python packages, as well as jupyterlab.

requirements.txt

When working in jupyterlab, we may find that we need additional python packages. Although we could spin-up a terminal and install the package using pip, we would have to do this each time we start-up the container. A simple fix is simply to add the package to the requirements.txt file and it will be automatically installed when we build the Docker image.

Additional Infrastructure

The files in the github repo allow us to spin-up a quick jupyterlab setup by simply typing docker-compose build then docker-compose up, then navigating to http://localhost:8888. Cool!

There are a few other useful components:

  1. Travis CI: For testing the build each time we make changes, including the .travis.yml file in the repo and enabling it on Travis-CI, we can make sure that future changes don't break the image.
  2. Docker Hub: Setting-up an automated build using this github repository means that we can easily use this image in other configurations docker-compose.yml where we could do something like the following to add a postgresql database accessible from within jupyterlab:
jupyterlab:
  image: mikebirdgeneau/jupyterlab
  links:
    - db:db
...
db:
  image: postgres

All in all, Docker should be a great time-saver, and allow me to focus on learning & development rather than infrastructure. Finally, we could simply deploy this docker setup on a cloud server, and we're up and running!

Comments

Python Development with Docker

I've been trying to brush up my skills in Python after spending a lot of time in R. One of the challenges I've run into is maintaining a common development environment on my Mac. Enter Docker.

Force Carbonation Charts

Force carbonation charts are readily available in either metric or imperial units, mostly the latter. The problem was, I use Celcius, and my regulator is in psi. This was a quick solution to the problem.

Leap Year Birthday?

If you're born on Feb 29th, when's your Birthday? Good question.

Other ways to reach me