Recently I packed up a python program into a Docker image. It took me roughly 3hrs to set up a fully functional docker image for distribution (thanks to docker cached layers!!!).
Here, I’m listing the necessary steps that might help some newbies like me.
As an example, suppose we have a python program prog
consists python scripts under src/
, libraries under lib/
, together with requirements.txt
for pip (or environment.yml
for conda), we can format prog
as the following.
prog/
|-src/
|-main.py
|-lib/
|-environment.yml
|-Dockerfile
Dockerfile
is a configuration file to build a docker image, where we can find the reference here. Basically, here are several keywords might be helpful for creating the Dockerfile
.
FROM
: import/base on another built image. e.g., FROM ubuntu:latest
COPY
: copy a file or directory into docker image. e.g., COPY hello/ /opt/hello
ENV
: set environmental variable. e.g., ENV PATH "/opt/conda/bin:${PATH}"
RUN
: run command(s). e.g., RUN ls -l
ENTRYPOINT
: run command(s) everytime the container get executed as the starting point. e.g., `ENTRYPOINT [ “ls” ]An example of Dockerfile
is given as below.
FROM ubuntu:latest
MAINTAINER WHOAM I <whoami@where.com>
RUN apt-get -y update
RUN apt-get -y install curl
# check archs
RUN echo $(uname -a)
RUN mkdir -p /opt/miniconda3
# miniconda
RUN curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o /opt/miniconda3/miniconda.sh
RUN ln -s md5sum /bin/md5
RUN bash /opt/miniconda3/miniconda.sh -b -u -p /opt/conda;
RUN echo "export PATH=/opt/conda/bin:$PATH" > /etc/profile.d/conda.sh
ENV PATH "/opt/conda/bin:${PATH}"
COPY environment.yml /opt/environment.yml
# install all dependencies
RUN conda env create -f /opt/environment.yml
COPY ./prog /opt/prog
# clean up
RUN rm -rf /opt/miniconda3/
RUN apt-get remove -y --purge curl
RUN apt-get autoremove -y
RUN apt-get clean -y
ENTRYPOINT conda run -n venv python /opt/prog/src/main.py
After creating the Dockerfile
, we can build the image via docker build
. Before doing so, make sure docker application is started already (use docker ps
to check). To build the image, we can traverse to the directory prog/
and execute the command.
$ docker build --platform=<platform> -t <image_name> .
Where <platform>
should be the destination platform, such as linux/amd64
, linux/arm64
, or darwin/amd64
. We can supply multiple and connected by comma as well. <image_name>
is optional, and a random string will be provided if <image_name>
is not given. We can check the image name and id via docker image ls
.
Right now, we have created a docker image. we can run docker buildx prune
with y
to cleanup all the caches generated during the image build step. Then, we can run docker run
to create a container and load the image, as follows.
docker run -w /opt -v $(pwd)/src:/opt/dst -it --rm <image_name> args
Here, -w/--workdir
specifies the docker working directory, -i/--interactive
to keep STDIN open even if not attached, -t/--tty
to allocate a pseudo-TTY, and --rm
to delete the created container after completion.
-v/--volume
is the option to bind mount the volume from local directory to a directory within the docker. As an example, -v $(pwd)/src:/opt/dst
bind mount the local directory $(pwd)/src
to the docker directory /opt/dst
within the container. When we create/modify/delete a file within one of them, it will be reflected/shared by the other. In this case, when our program generates output files, we can place them into /opt/dst
. After program completion, they can be found under $(pwd)/src
. We can also put input files into $(pwd)/src
before docker run
as well. args
are the arguments passing to the command specified after ENTRYPOINT
in Dockerfile
, which are served as command-line arguments.
Be mindful that the filesystem in docker is independent to the local machine. Thus, any shared files or directories must be bind mounted via -v
.
Now, you should have a portable docker image, enjoy it!