Recently I packed up a python program into a Docker image. It took me roughly 3hrs to set up a fully functional docker image for distribution (thanks to docker cached layers!!!).
Here, I’m listing the necessary steps that might help some newbies like me.
As an example, suppose we have a python program prog consists python scripts under src/, libraries under lib/, together with requirements.txt for pip (or environment.yml for conda), we can format prog as the following.
    prog/
        |-src/
            |-main.py
        |-lib/
        |-environment.yml
        |-Dockerfile
Dockerfile is a configuration file to build a docker image, where we can find the reference here. Basically, here are several keywords might be helpful for creating the Dockerfile.
FROM: import/base on another built image. e.g., FROM ubuntu:latestCOPY: copy a file or directory into docker image. e.g., COPY hello/ /opt/helloENV: set environmental variable. e.g., ENV PATH "/opt/conda/bin:${PATH}"RUN: run command(s). e.g., RUN ls -lENTRYPOINT: run command(s) everytime the container get executed as the starting point. e.g., `ENTRYPOINT [ “ls” ]An example of Dockerfile is given as below.
FROM ubuntu:latest
MAINTAINER WHOAM I <whoami@where.com>
RUN apt-get -y update
RUN apt-get -y install curl
# check archs
RUN echo $(uname -a)
RUN mkdir -p /opt/miniconda3
# miniconda
RUN curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o /opt/miniconda3/miniconda.sh
RUN ln -s md5sum /bin/md5
RUN bash /opt/miniconda3/miniconda.sh -b -u -p /opt/conda;
RUN echo "export PATH=/opt/conda/bin:$PATH" > /etc/profile.d/conda.sh
ENV PATH "/opt/conda/bin:${PATH}"
COPY environment.yml /opt/environment.yml
# install all dependencies
RUN conda env create -f /opt/environment.yml
COPY ./prog /opt/prog
# clean up
RUN rm -rf /opt/miniconda3/
RUN apt-get remove -y --purge curl
RUN apt-get autoremove -y
RUN apt-get clean -y
ENTRYPOINT conda run -n venv python /opt/prog/src/main.py
After creating the Dockerfile, we can build the image via docker build. Before doing so, make sure docker application is started already (use docker ps to check). To build the image, we can traverse to the directory prog/ and execute the command.
$ docker build --platform=<platform> -t <image_name> .
Where <platform> should be the destination platform, such as linux/amd64, linux/arm64, or darwin/amd64. We can supply multiple and connected by comma as well. <image_name> is optional, and a random string will be provided if <image_name> is not given. We can check the image name and id via docker image ls.
Right now, we have created a docker image. we can run docker buildx prune with y to cleanup all the caches generated during the image build step. Then, we can run docker run to create a container and load the image, as follows.
docker run -w /opt -v $(pwd)/src:/opt/dst -it --rm <image_name> args
Here, -w/--workdir specifies the docker working directory, -i/--interactive to keep STDIN open even if not attached, -t/--tty to allocate a pseudo-TTY, and --rm to delete the created container after completion.
-v/--volume is the option to bind mount the volume from local directory to a directory within the docker. As an example, -v $(pwd)/src:/opt/dst bind mount the local directory $(pwd)/src to the docker directory /opt/dst within the container. When we create/modify/delete a file within one of them, it will be reflected/shared by the other. In this case, when our program generates output files, we can place them into /opt/dst. After program completion, they can be found under $(pwd)/src. We can also put input files into $(pwd)/src before docker run as well. args are the arguments passing to the command specified after ENTRYPOINT in Dockerfile, which are served as command-line arguments.
Be mindful that the filesystem in docker is independent to the local machine. Thus, any shared files or directories must be bind mounted via -v.
Now, you should have a portable docker image, enjoy it!