What is Dockerfile?
As said above, a container is a capsulated environment for running our algorithms. This environment is supported by the docker extension responsible for supporting containerization.
In order to do so, we first need to define a dockerfile which specifies our system requirements. Think of the dockerfile as a document or a ‘recipe’ which defines our container template, which is called the docker image.
Here is an example for a dockerfile we will use as part of this tutorial:
FROM ubuntu:20.04# Update and install necessary dependencies
RUN apt-get update && \
apt-get install -y python3-pip python3.8 git && \
apt-get clean
# Set working directory
WORKDIR /app
# Copy the requirements file and install dependencies
COPY requirements.txt .
RUN pip3 install --no-cache-dir -r requirements.txt
# Install transformers separately without its dependencies
RUN pip3 install --no-cache-dir transformers
This dockerfile contains several important steps:
(1) We import base images for having an Ubuntu environment.
(2) we install pip, python and git. The apt-get is a linux command for package handling.
(3) We set our workdir name (in this example it is /app)
(4) We install our requirements detailed in the requirements.txt file
Dockerfile provides us with a lot of flexibility. For instance, my repo relies on the transformers library without its dependencies, so I installed it separately (the last row in the Dockerfile).
Note — Working with containers offers many benefits in terms of speed and agility, yet there are drawbacks as well. Safety is one of them. Container images uploaded by untrusted resources might contain malicious content. Make sure you are using a trusted source and that your container is configured properly. Another option is to employ security tools like snyk, which scan your docker image for any potential vulnerabilities.
Preliminary Prerequisites
Before we create a docker container, we first need to make sure our local working environment is ready. Let’s make sure we have the following checklist:
1. VS Code as our code editor : https://code.visualstudio.com/
2. Git for version control management: https://git-scm.com/downloads
3. Github user: https://github.com/
4. https://www.docker.com/
After you complete all these prerequisites, make sure to sign in to the docker app you have installed. This will enable us to create a docker container and track it’s status
Step 1 — Cloning the Repo
To begin, let’s select a repo to work with. Here I provided a repo containing an algorithm which estimates whether a text is AI generated by combining both the model’s perplexity value given a text and the number of spelling errors. Higher perplexity implies that it is more difficult for LLM to predict the next word, hence wasn’t generated by a human.
The repo’s link:
On github, Click code and copy the HTTPS address as follows:
After that, open the VS Code, and clone a repo you wish to include in your container. make sure VS Code is connected to your github account. Alternatively, you can also init a new git repo.