I've been trying to brush up my skills in Python after spending a lot of time in R. One of the challenges I've run into is maintaining a common development environment on my Mac. Sometimes I work in Python v2, other times, v3. I can use environments to maintain a set of packages, but then I move to another machine, and some of the libraries are finicky to install. The bottom line is that I'd rather be spending this time developing & learning, rather than fiddling with my local environment.
This is where Docker comes in. Effectively, it allows me to spin up a virtual machine based on a scripted (version controlled) configuration so I can reproduce the same environment easily and consistently on my local machine, across various operating systems, as well as any production environment. Easy!
I've set-up a simple example using Jupyterlab on Docker, which you can view on github: jupyterlab-docker
There are a couple key components:
- docker-compose.yml The main configuration file for my docker set-up.
- jupyterlab/Dockerfile The container configuration for the jupyterlab machine.
- jupyterlab/requirements.txt An easy way to add additional python libraries as needed!
It's probably easiest to explore these through the Github repository, but here's a quick overview of each:
docker-compose.yml file allows for inclusion of multiple docker containers that run together with networking and data sharing. This allows for us to create a folder called
data that will persist after we shutdown the instance.
This also allows us to add additional containers in the future, which can talk to each other -- for example, we could spin up a container with a database instance to work with the jupyterlab notebooks.
The Dockerfile is the main image configuration, we start with a base Python image using the small 'alpine linux' distribution, and then add the dependencies we need to build python packages, as well as jupyterlab.
When working in jupyterlab, we may find that we need additional python packages. Although we could spin-up a terminal and install the package using
pip, we would have to do this each time we start-up the container. A simple fix is simply to add the package to the
requirements.txt file and it will be automatically installed when we build the Docker image.
The files in the github repo allow us to spin-up a quick jupyterlab setup by simply typing
docker-compose build then
docker-compose up, then navigating to
There are a few other useful components:
- Travis CI: For testing the build each time we make changes, including the
.travis.ymlfile in the repo and enabling it on Travis-CI, we can make sure that future changes don't break the image.
- Docker Hub: Setting-up an automated build using this github repository means that we can easily use this image in other configurations
docker-compose.ymlwhere we could do something like the following to add a postgresql database accessible from within jupyterlab:
jupyterlab: image: mikebirdgeneau/jupyterlab links: - db:db ... db: image: postgres
All in all, Docker should be a great time-saver, and allow me to focus on learning & development rather than infrastructure. Finally, we could simply deploy this docker setup on a cloud server, and we're up and running!