Docker and Files

Docker is great for setting up build environments. However if you are trying to make a docker with a pre-installed IDE or some other software, a problem emerges. How do you use the COPY or ADD commands without leaving behind a large and unwanted layer?

Multi-stage builds to the rescue

Use this approach to save space and avoid any trace of the large .tar.gz file in the history of your Docker.

FROM ubuntu:jammy-20230308 as builder

# copy large file in
COPY CCS12.5.0.00007_linux-x64.tar.gz /opt/CCS12.5.0.00007_linux-x64.tar.gz

# Do any extraction work
RUN cd /opt && tar -xf CCS12.5.0.00007_linux-x64.tar.gz

###############################################################################
FROM ubuntu:jammy-20230308 as final

# Copy the extracted folder only
COPY --from=builder /opt/CCS12.5.0.00007_linux-x64 /opt/CCS12.5.0.00007_linux-x64

# And the rest of your docker file as normal

Results

A temporary docker stage, called builder is created. The COPY command is used here (which always leaves behind a layer). Then the contents are untarred. In this step you can do any additional steps that are required. Then, in final, you only copy over the extracted folder. Note that you don’t need to delete the tar.gz, as all layers from builder are automatically deleted.

Further Background

The problem where COPY and ADD leave behind a layer is a significant one. Similar to git, each layer that is left behind is a permanent record in the history of your docker. What this means is that if you create a large file in one layer, and then in the next layer you delete it, the docker’s size will permanently be larger, despite the deletion.

A new layer is created every time you have a new RUN statement. You will often see tips online where you create a large, file, operate with it, and then delete it all within the same RUN command. This approach works great if the file comes in via wget, however the trick I’ve mentioned above is best if the source of the file is from COPY or ADD.

An additional optimization would be to run the full installation in the build stage, and only copy the final, installed files in the final stage.