Converting ACTUAL commandline tools to Docker

The Problem

So Docker is a thing, and when it works, it is magical – the power to run your code on any system regardless of host operating system, system dependencies, etc. The problem is getting ones software to the point where it is containerized properly and usable. There are some guides about how to do this, but most are about how to deploy your webapp with docker. I, being a bioinformatician, am interested in processing files, not serving a webapp, and this sort of conflicts with the point of Docker. Docker tries very hard to ensure that your docker application can only do what you tell it, and cannot effect the host operating system. This is great, until you want to, say, write an output file to your host operating system.

Docker has ways around this, but they aren’t necessarily straightforward. So, on to a practical example!

Prepare your environment

Get Docker installed, and make a directory for us to play with somewhere on your machine, and move into it.

mkdir trydocker
cd trydocker

Lets say we have a commandline tool. Make a script with these contents, and add it to a file called called magic.sh.

# magic.sh
# takes a directory and an input file
#!/bin/bash
mkdir $1
cat $2 > ${1}/results
echo "...which is the best?" >> ${1}/results

This script takes an target directory and an input file, and all it does is append a bit of text to the input file and save it in the new directory. Create an input file called “species.txt”.

Loxodonta africana
Loxodonta cyclotis
Elephas maximus

Try it out by making the script executable with chmod and then running it with a output directory called “local_elephants” and our input file species.txt :

chmod +x ./magic.sh
./magic.sh local_elephants species.txt

And take a look at what we ended up with:

.
├── local_elephants
│   └── results
├── magic.sh
└── species.txt

This does two things that docker is not going to be happy about: creating a directory on the host os, and creating a file on the host os. But lets see that firsthand, shall we? Make a Dockerfile for this project

FROM        ubuntu:latest
MAINTAINER  Me me@memail.com

# copy our script from the host os into the container;
# this in a real scenario could be installing stuff from git, apt, etc
COPY ./magic.sh /bin/magic.sh

and lets build it!

docker build -t me/trydocker .

You should see your container get built, tagged as me/trydocker in case we are uploading this to dockerhub later. Lets try to run it:

docker run -i me/trydocker /bin/magic.sh docker_elephants species.txt
cat: species.txt: No such file or directory

We get an error for no such file existing! But how can it, its right here?! Well, to protect you, docker can’t see it. Further, it cant see our output directory either. You can confirm this by entering into the container and looking around:

docker run -i me/trydocker bash
ls /

Introducing Volumes: the bridge between host and container

What we need is a way for the OS and the container to talk to each other, but only when we want them to. Docker allows us to do this with whats called “volumes”. The docs about this can be a bit opaque, really geared to towards computer scientists, so here is my non experts explanation.

Lets make a volume so that the container can see our current directory, where species.txt is. We will use the -v flag to set what the volume is called on our host OS, and what to call it within the container. Because species.txt is right here, lets set the volume on the OS to be right here with ${PWD}, and lets call the directory on the inside to be /input.

docker run -v ${PWD}:/input -i me/trydocker /bin/magic.sh docker_elephants species.txt
cat: species.txt: No such file or directory

That still didn’t work. Why? Well, we made the bridge, but now we have to refer to the input files in relation to what we named the bridge FROM THE CONTAINER SIDE (whats to the right of “:” ).

docker run -v ${PWD}:/input -i me/trydocker /bin/magic.sh docker_elephants /input/species.txt

No error, but lets take a look at our current directory:

.
├── Dockerfile
├── local_elephants
│   └── results
├── magic.sh
└── species.txt

Wait, where is our output folder, docker_elephants???? Spoiler – its still in the container :(

We have succeeded in getting the first part of our program to work, but we need to get our output from the container. We already have a bridge for the input, but its good to keep input and output separate. So, lets make a second bridge, this time for our output. Remember what we did to the path for the input? We will do the same thing again: make it a path in relation to the container:

docker run -v ${PWD}:/input -v ${PWD}:/output -i me/trydocker /bin/magic.sh /output/docker_elephants /input/species.txt

And lets look at our current directory:

.
├── Dockerfile
├── docker_elephants
│   └── results
├── local_elephants
│   └── results
├── magic.sh
└── species.txt

Wahoo! We now are able to get access from the host to read some sort of input, and access to write to the host too!

In Closing

In a simple scenario, we could have used the --workdir flag to docker run, but sometimes you will have to wrap a messy script that involves changing directories, etc, especially if they are written in Perl. Controlling the volumes like this allows Docker to do what its designed to do: some specific task without monkeying around the rest of your host operating system.