5 Steps to Setup your Google Cloud Engine for YOLOv5

Jean-Sébastien Grondin
8 min readOct 22, 2020
Photo by Riccardo Bresciani from Pexels

This guide is intended for new Google Cloud Platform (GCP) users who wish to build and train YOLOv5 object detection models in the cloud. I will show you how to setup your compute engine in GCP and your virtual environment with conda and will also share some bonus tricks to make your life easier during training.

YOLOv5 is a very powerful and fast object detection model that was released in Apr 2020 and that has attracted a lot of interest and attention from the community. It requires an up-to-date verified environment with all dependencies including CUDA, Python and Pytorch preinstalled, which is what we will be focusing on in this tutorial.

Requirements

You need to have:

(1) an activated/premium GCP account

(2) a project to work into

Results

(1) You will have a compute engine instance with the right environment to build and train custom YOLOv5 models

(2) I will also show you a neat trick for monitoring your experiments with tensorboard

(3) I will also show you how to use tmux to launch a job on your instance and make sure it continues running after closing your notebook or terminal

Let’s get at it!

Step #1 Creating a virtual machine instance

In the navigation menu, find the Compute Engine app and select VM instances, then click on Create.

Not all regions and zones host the same type of machines so make sure to explore different options if you can’t find what you need. For this tutorial, we pick a N1 series CPU platform in zone us-east1-c, with 4 CPUs and 26 GB of memory.

Under GPUs, make sure to click ‘Add GPU’ and add the desired type. We select one NVIDIA Tesla P100. This computing performance is more or less equivalent to what we have in a Google Colab Pro account, but without the frustrating time restriction. We decide to change the default boot disk for a 100GB persistent SSD disk that runs an Ubuntu 20.04 LTS OS.

Make sure to tick these options to enable HTTP/HTTPS traffic. Then click on ‘Create’.

Your newly created instance should now appear in the console. You may run into a warning message saying : “Quota ‘GPUS_ALL_REGIONS’ exceeded. Limit: 0.0 globally.”

If that is the case, you can fix this by finding the ‘Quotas’ page under ‘IAM & Admin in the navigation menu.

You can then change the quota by changing the filter type to ‘gpus_all_regions’, then ticking the box ‘Global’ in the right hand pane, then change the limit to ‘1’ after providing your contact information. After submitting the request, you will need to wait to receive the approval (up to two business days). Note: you will need to have activated your account to premium to be able to change this quota, and you will also need to make sure you have the Owner role for your project.

Congratulations! Your instance can now be started!

CAREFUL: Make sure to stop your instance when it is not in use, or you will be charged for the attached resources!

Step #2 Setting up your environment using conda

Once the instance is started, you can connect to it via ssh using the ‘SSH’ button near the three vertical dots. This will open up a window terminal. You can then proceed with installing conda, using this guide.

You can now proceed with setting up the package environment for your model. We first create a conda environment with conda create --name <ENVNAME>, then activate it with conda activate <ENVNAME>. Now we can install all required packages. We begin with installing packages which are requirements of YOLOv5:

conda install cudatoolkit=10.1.243
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
conda install -c anaconda cython
conda install -c conda-forge matplotlib
conda install -c anaconda pyyaml
conda install -c anaconda scipy
conda install -c conda-forge tqdm
conda install -c conda-forge opencv
conda install -c conda-forge tensorboard

The above will enable installing Pytorch version 1.6.0, which is a minimum requirement for YOLOv5. You can verify that this is the case with conda list. Note: if the opencv installation gives the error Solving environment: failed with initial frozen solve, the following steps can help you fix this issue.

You can also install jupyter, which you can use to load the YOLOv5 tutorial as well as build and train models:

conda install jupyter

To give access to the newly created environment in jupyter, we will use ipykernel:

conda install -c anaconda ipykernel

Finally, after installing ipykernel, we can use the following to attach the environment to jupyter:

python -m ipykernel install --user --name=<ENVNAME>

We will also install tensorboard and tmux, which I will show you how to use in the bonus sections below.

conda install -c conda-forge tensorboard
conda install -c conda-forge tmux

That’s it! Your environment should now be ready! Let’s now install the GPU drivers.

Step #3 Install GPU drivers

To install your GPU driver, you can follow google’s recommended steps, which are listed below for convenience:

A) If you do not have your instance terminal open, you can SSH into your instance with the same method previously used.

B) Since we are using an Ubuntu 20.04 OS on the instance, we can copy-paste and execute the following:

curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"

Then we can update the package list:

sudo apt update

And finally we can install CUDA, which includes the NVIDIA driver

sudo apt install cuda

You should now be able to run nvidia-smi and see an output like the following, which confirms that the driver was installed properly:

Step #4 Setting Up Jupyter Notebook

If you are interested in using Jupyter Notebook for your experiments, you can set it up using this very useful tutoriel, from step 4 onwards, as you should have completed steps 1–3 already.

Once you are in Jupyter Notebook, you can create your first notebook. When doing so, make sure to select the environment you previously created (e.g. yolo5) in Step 2, thanks to ipykernel.

Once in your notebook, you can run make one final verification to make sure that Pytorch can use your GPU by using the following commands.

Step #5 Cloning YOLOv5’s repo

You can now clone the ultralytics’s YOLOv5’s repo.

git clone https://github.com/ultralytics/yolov5

If you want to train or fine-tune your model on the COCO dataset, you can download it with this command.

bash ./yolov5/data/get_coco2017.sh

If instead, you want to train your model on custom data, you can zip it and transfer it using the method of your choice, among that listed in this google cloud tutorial.

You should now be fully prepared for training your YOLOv5 model!

CAREFUL: Make sure to stop your instance when it is not in use, or you will be charged for the attached resources!

-- Happy Training!

BONUS: A neat trick with tensorboard

In your instance SSH terminal, enter the following command:

tensorboard --logdir <LOG_DIRECTORY> --port=<PORT>

Here, <LOG_DIRECTORY> should correspond to the path where you decided to save YOLOv5’s logs and weights during training. You can select a port that is not used. For this example, we pick --port=8008.

Now open a local linux terminal and enter:

gcloud compute ssh instance-1 -- -NfL 8008:localhost:8008

You should now be able to launch a web browser and access your tensorboard dashboard at localhost:8008 . Congratz!

When you are done, if you want to find and kill the process that corresponds to port 8008, you can use:

lsof -ti:8008

then use the returned PID to kill that process with kill -9 <PID> .

BONUS: A neat trick with tmux

If you are new to tmux, you will likely be amazed by its simplicity and how useful it can be. You should already have it installed from Step #2. We decide to call our session yolo-1, so we type the following in the instance terminal:

tmux new -s yolo-1

This creates a tmux terminal window, which we can use to launch our training script (e.g. python train.py ... ). After launching the script, you can detach the tmux window by pressing Ctrl+B, and then D. Once detached, the tmux session will continue to run in the background. At this point, you can close this terminal window and log out of your GCP account and it will continue running, as long as you don’t shut down the instance. To re-attach to the tmux session, you simply type the following in the instance terminal:

tmux attach-session -t yolo-1

As a simple test, you can type the following in a tmux terminal:

i=0; while [ $i -le 1000 ]; do echo Number: $i; ((i++)); sleep 10; done

A number will be incremented every 10 seconds. Now, detach from the session, close your terminal, re-open it, reattach to your tmux session and you should see that the job has continued running.

If you are curious to discover other interesting features of tmux, you may find this tutorial useful.

— — —

Lets connect : https://www.linkedin.com/in/jsgrondin/

Follow me on medium: https://medium.com/@grondin.js

— — —

References

--

--