Docker Tutorial Part 3: Dockerfile
Dockerfile
A Dockerfile
is essentially a text file with a predetermined structure that contains a set of instructions for building a Docker image. The instructions in the Dockerfile specify what base image to start with (for example, Ubuntu 20.04), what software to install, and how to configure the image. The purpose of a Dockerfile is to automate the process of building a Docker image so that the image can be easily reproduced and distributed.
The structure of a Dockerfile is a list of commands (one per line) that Docker (containerd to be exact) uses to build an image. Each command creates a new layer in the image in UnionFS, and the resulting image is the union of all the layers. The fewer layers we manage to create, the smaller the resulting image.
The most frequently used commands in a Dockerfile are the following:
FROM
COPY
ADD
EXPOSE
CMD
ENTRYPOINT
RUN
LABEL
ENV
ARG
VOLUME
USER
WORKDIR
FROM
A Dockerfile starts with a FROM command, which specifies the base image to start with:
FROM ubuntu:20.04
You can also name this build using as keyword followed by a custom name:
FROM ubuntu:20.04 as builder1
docker build
will try to download Docker images from the public Docker Hub registry, but it’s also possible to use other registries out there, or a private one.
COPY and ADD
The COPY
command is used to copy files or directories from the host machine to the container file system. Take the following example:
COPY . /var/www/html
You can also use the ADD
command to add files or directories to your Docker image. ADD
has additional functionality beyond COPY
. It can extract a TAR archive file automatically and check for the presence of a URL in the source field, and if it finds one, it will download the file from the URL. Finally, the ADD
command has a --chown
option to set the ownership of the files in the destination.
In general, it is recommended to use COPY
in most cases, and only use ADD
when the additional functionality it provides is needed.
EXPOSE
The EXPOSE
command in a Dockerfile informs Docker that the container listens on the specified network ports at runtime. It does not actually publish the ports. It is used to provide information to the user about which ports are intended to be published by the container.
For example, if a container runs a web server on port 80
, you would include the following line in your Dockerfile:
EXPOSE 80
You can specify whether the port listens on TCP or UDP – after specifying the port number, add a slash and a TCP or UDP keyword (for example, EXPOSE 80/udp
). The default is TCP if you specify only a port number.
The EXPOSE
command does not publish the ports. To make ports available, you will need to publish them with the use of the -p
or --publish
option when running the docker run command:
docker run -p 8080:80 thedockerimagename:tag
This will map port 8080
on the host machine to port 80
in the container so that any incoming traffic on port 8080
will be forwarded to the web server running in the container on port 80
.
Regardless of the EXPOSE
command, you can publish different ports when running a container. EXPOSE
is used to inform the user about which ports are intended to be published by the container.
ENTRYPOINT and CMD
The ENTRYPOINT
instruction defines the command that will always be executed when the container starts. It essentially turns the container into an executable that behaves like a binary or script. Unlike CMD
, which can be fully replaced by arguments at runtime, ENTRYPOINT
is fixed, and arguments you pass with docker run are simply appended to it.
You usually use ENTRYPOINT
when you want your container to act like a single-purpose tool.
ENTRYPOINT ["curl"]
- Running
docker run mycurl https://example.com
executescurl https://example.com
docker run
: start a new container.mycurl
: the image name.https://example.com
: arguments passed to the container.
Docker does the following internally:
- Looks at the image
mycurl
. - Sees that the
ENTRYPOINT
is set to["curl"]
. - Appends your command-line arguments (
https://example.com
) to theENTRYPOINT
. - It become
curl https://example.com
If you want to provide default arguments that can be overridden, you combine it with CMD
.
ENTRYPOINT ["python"]
CMD ["app.py", "--debug"]
- Running
docker run myapp
:python app.py --debug
- Running
docker run myapp server.py
:python server.py
Meanwhile, if you only use CMD
, it’s fully replaceable at runtime.
CMD ["nginx", "-g", "daemon off;"]
- Running
docker run webserver
: runs nginx - Running
docker run webserver bash
: runs bash instead (overridesCMD
)
Rule of thumb:
- Use
ENTRYPOINT
if the container should always execute a specific binary. - Use
CMD
for providing defaults that the user might want to override. - Combine both if you want an executable with flexible arguments.
RUN
The RUN
instruction is executed at build time and creates a new layer in the image. It’s used to install software, configure the environment, or set up files. Once executed, its results are part of the final image.
For example, you can use the RUN
command to install system dependencies and clean up:
RUN apt-get update && \
apt-get install -y git curl && \ # install required tools
rm -rf /var/lib/apt/lists/* # Cleanup reduces image size.
You can use the RUN command to create a directory:
RUN mkdir -p /data/logs
or prepare a non-root user:
RUN useradd -ms /bin/bash appuser
It’s worth noting that the order of the RUN
commands in the Dockerfile is important, as each command creates a new layer in the image, and the resulting image is the union of all the layers. So, if you’re expecting some packages to be installed later in the process, you need to do it before using them.
LABEL
The LABEL
instruction attaches metadata to an image in the form of key-value pairs. This can include maintainer info, version, licensing, or anything meaningful to your workflow.
LABEL maintainer="Alice Lee <alice@example.com>" \
version="1.4" \
description="Lightweight API server image"
This metadata can later be queried:
docker inspect myimage | grep version
Labeling images makes them easier to manage, track, and audit in CI/CD pipelines.
ENV and ARG
Both define variables, but their lifetimes differ:
ARG
: available only at build time. Used to pass values when running docker build.ENV
: It creates an environment variable that is accessible to all processes running inside the container.
ARG APP_VERSION=latest
RUN echo "Building version $APP_VERSION"
You can override it at build:
docker build --build-arg APP_VERSION=2.0 .
The ENV
command is used to set environment variables:
ENV PORT=8080
EXPOSE $PORT
Now any process inside the container can access $PORT.
Rule of thumb:
- Use
ARG
for build-time options (e.g., base image tag, dependency version). - Use
ENV
for runtime configuration (e.g., service port, API key).
VOLUME
The VOLUME
instruction in a Dockerfile defines a mount point where data can be stored outside the container’s writable layer.
- This means that the data will persist even if the container is removed or rebuilt.
- Volumes are managed by Docker, or you can bind them to host directories.
- They’re especially important for databases, logs, or user-generated content.
When you run a container without volumes, everything is stored in its writable layer.
- That writable layer is destroyed when you remove the container (
docker rm
). - That’s why data disappears.
When you declare a VOLUME
in the Dockerfile (or with -v
at docker run
), Docker does something special:
- It says: “Don’t keep this folder (/var/lib/mysql) inside the container’s temporary writable layer.”
- Instead, it mounts an external storage location (on the host) to that folder.
- The container sees
/var/lib/mysql
as normal, but under the hood, all files actually live in:
/var/lib/docker/volumes/<volume_id>/_data (if Docker-managed volume)
or
/my/local/db (if you used -v /my/local/db:/var/lib/mysql)
Imagine you’re building a Docker image for MySQL. You don’t want your database data to disappear every time the container restarts.
FROM mysql:8.0
## Define where MySQL stores its data
VOLUME ["/var/lib/mysql"]
EXPOSE 3306
VOLUME ["/var/lib/mysql"]
tells Docker that/var/lib/mysql
should be stored outside the container layer system. You can see the volume with
docker volume ls
docker volume inspect <volume_name>
USER
By default, containers run as root, which is risky. The USER
instruction sets a non-root user for better security.
- If there’s a Docker or kernel vulnerability, a process running as root inside the container could potentially escape and gain root access on the host.
- This is the main worry in multi-tenant environments (like shared servers or Kubernetes clusters).
RUN useradd -ms /bin/bash appuser
USER appuser
- Now every process inside the container runs as
appuser
. -m
: create the user’s home directory (e.g.,/home/appuser
).-s /bin/bash
: set the user’s login shell to/bin/bash
.
You can still override:
docker run --user root myimage
WORKDIR
The WORKDIR
instruction sets the current working directory inside the container. Every subsequent RUN, CMD, ENTRYPOINT, COPY
, or ADD
will be executed relative to this directory.
You can use the WORKDIR
command to set the working directory to /usr/local/app
:
WORKDIR /usr/local/app
WORKDIR /usr/src/app
COPY . .
RUN pip install -r requirements.txt
COPY . .
: copies files into/usr/src/app
RUN pip install -r requirements.txt
: runs inside/usr/src/app
The WORKDIR
can be changed multiple times during an image build.
FROM ubuntu:22.04
# First working directory
WORKDIR /app
RUN echo "Hello" > hello.txt
# Change to another directory
WORKDIR /app/subdir
RUN echo "World" > world.txt
# Copy from build context relative to new WORKDIR
COPY main.py .