How to Create Azure NVIDIA GPU Support Virtual Machine
Hi fellows, in this article I want to deeply focus on Azure GPU-supported virtual machine creation and prepare a guideline for you. I haven't seen a lot of articles about this topic to help even though there are many CUDA version conflicts and driver issues. Therefore, I want to create documentation to help each other. Enjoy!
Creating Azure VM with GPU-enabled Instance
You can check this page that can help which location is enabled or disabled for my current instance specifications. However, at the and it might not be possible to select always, it sometimes causes errors when your subscription plan is not eligible for that step. In this case, you should select different options or request a quota. — Azure VM Select
When you select all “region” options, you would see many options that might be cheaper than other ones. Sometimes GPU-enabled workloads could be not available because of the high demands. Hence, you might select specific locations.
Sometimes it gives us an instance which has a reasonable price, however, it wouldn’t be suitable in your Azure portal page. In my case, I got A10 GPU support machine from that page but Azure gave me T4 GPU instance after I requested a quota. They said they turned off their previous and old instances each month.
As an example as you can clearly see below, I should demand a “request quota” to get T4 GPU-enabled instance.
If you don’t exceed the limit or request an extreme limitation, they respond 1–2 days as positively. After, you have a right to create your VM with GPU enabled. If you have already a quota or get one after your request, you can create it as below. I have created with these settings.
CUDA Settings and NVIDIA CUDA Driver Installation
Please follow these steps carefully, it might take your time if you miss any step. Azure GPU-supported instances required several unique settings.
If you follow the auto driver installation in Ubuntu, you can see some errors and bugs as I did. The main point of this article is to create a GPU-support VM, then run Whisper Speech-to-Text(STT) model, after dockerize and run it in the docker container as GPU supported docker container.
First, connect via SSH or relevant way to your Azure machine. After clean & update dependencies.
sudo apt-get autoremove
sudo apt-get autoclean
sudo apt-get update
sudo apt-get upgrade
#If you have already installed or don't sure, run these clean-up commands.
sudo apt-get --purge remove "*nvidia*"
sudo apt-get --purge remove "*cublas*" "cuda*" "nsight*"
sudo apt-get autoremove
sudo apt-get autoclean
sudo rm -rf /usr/local/cuda*
Then, build essential tools and Ubuntu driver common package.
sudo apt-get install build-essential
sudo apt-get update
sudo apt install ubuntu-drivers-common
You will see the supported drivers list below after you run the command. You can choose one of these or continue as auto-instal option. However, auto-installation option causes several bugs. I have selected the manual one.
sudo ubuntu-drivers devices
sudo apt install nvidia-driver-535-server
“server” option is required and important for our VM machine because Azure Linux instances need this option that I tried both — 535distro non-free recommended but I got driver conflicts. Though you can select it if you would have any problem. Do not forget, after re-installation you must delete every package about NVIDIA to fresh install. To make sure that, please check remove commands above about NVIDIA CUDA data folder locations.
Except for re-installation, your next step is to reboot your instance and try again to install.
sudo reboot
Then we are going to install a suitable CUDA Toolkit in our machine. You should select the right version for your machine, please do not forget it (caused driver conflicts).
You can select the appropriate Linux version for your machine below the link, then follow each step.
To be sure how to know and select a suitable version of the CUDA Toolkit, you can run the below command.
nvidia-smi
#(535.xx)
After, read the documentation and follow the versions table. I have selected my version that way.
In this case, 12.2 is suitable for me, I have installed CUDA Toolkit 12.2. It might be different for you, please select the right one and most “suitable” updated one.
Follow the steps in above page(might be different for you) and run these commands.
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda
After installation, you can clearly see your versions with “nvcc --version” and “nvidia-smi” commands. If you get any error while run these commands, that seems you got the wrong version.
If you clearly see your results like above, it runs successfully without any conflict. Please follow my next article to understand how to execute Whisper Speech-To-Text(STT) model in a GPU-enabled Azure Virtual Machine.