Running Simulations on Azure

Lab head: Andy Wills

Manual is written by Lenard Dome.

Azure is a cloud computing solution that provides services including virtual machines (VM), servers for data processing, computing clusters for batch jobs. Azure is now on of the top 10 supercomputers in the world (source).

In the lab, we are using Azure to run computationally demanding jobs that would require a supercomputer to finish in a reasonably humane amount of time. The University of Plymouth also has a supercomputer, but it is less supported than Microsoft Azure at BRIC and maintained by other departments.

Set up Microsoft Azure on Linux

In the lab, we run Linux, so this manual is specifically aimed at getting you ready to use Azure via the Linux CLI (e.g. bash, zsh, fish, …). Most of the solutions will also work on Windows, but they are not tested on Windows. Alternatively, you can install Ubuntu on Windows as a subsystem, see more info here.

This document also assumes that you have the right access to Azure. If you still need to arrange access, you won’t be able to follow this manual.

You should have also received the following information:

  • username your username for the VM.

  • host an IP address of the VM.

  • resource-group the name of your resource group.

  • vm-name the name of the virtual machine.

Our currently available VM instances at BRIC

[vm-bric-spot-comp-modelling]

  • Ubuntu 20.04.3 LTS (GNU/Linux 5.8.0-1043-azure x86_64)

  • 3x AMD EPYC 7452 32-Core Processor

  • 96 vCPUs

  • 385 GB RAM

  • 200 GB Storage (Shared)

  • R version 4.1.2 (2021-11-01)

  • Spot Pricing

Connecting to the University’s Network

In order to connect and manage your VM instance, you will need to be connected to eudoram on campus, or to be connected to the University’s VPN. There is a good deal of documentation on how to set up VPN on Linux.

Using Azure CLI to manage your VM

Install Azure CLI for Linux according to your preferred method. There is official documentation on how to do it here.

After installing it, you need to log in to your account with

az login

This command will open a browser window, where you will need to log in with your University email and password. In the future, there will be multiple and different type of instances - e.g. one with high number of CPU that is optimized for computing and one with a high-end GPU for Deep Convolutional Networks. You can list all VMs that are accessible to you with

az vm list

before selecting which one to start for the job.

To start the VM, use

az vm start -g resource-group -n vm-name

To shut down the VM, you will need to use az vm deallocate. Simply using az vm stop will stop the VM, but a stopped VM will still incur charges.

az vm deallocate -g resource-group -n vm-name

Note that there will be a lag between starting your VM and being able to log in via ssh - this is probably due to the boot process.

It is very important to set up azure on the VM as well. When you logged in to your VM (details below), make sure to run az login. Beware, that you need to re-authenticate every 3 months.

Logging in

There are various solutions, like PuTTy, xRDP (seems the best for regular psychologists), ssh. I am using simple ssh.

ssh username@host

Enter your password when prompted. Alternatively, you can use sshw to manage multiple logins. You can also use ssh keys to avoid managing passwords. This askubuntu answer can be of use. You can copy files to the instances by using scp.

# copy file
scp my-files-to-copy username@host:/home/path/to/my-project/
# copy all things in current directory
scp *  username@host:/home/path/to/my-project/

It is also possible to do it with azure-cli, but I couldn’t set it up with my account - server-side authorization problems.

The code that you should run on Azure

When you should run your code on Azure:

  • Your script takes more then a week to run on the local lab machine.

  • You need a lot of memory (> 64 GB).

  • You need a lot of GPU MEMORY (> 5 GB).

  • The simulation script can run by itself and write the results to disk without any human intervention.

How to install R packages

The root directories on the VM will not be writeable by either you or R. The .libPaths command will enable you to set new target directories for the packages you are about to install.

.libPaths("home/your/new/path")

In most scenarios, use as few packages as possible. It is recommended not load or install metapackages like tidyverse. It is guaranteed that it will fail to install due to some obscure missing dependency. Instead, it is better to simply use dplyr, reshape2, as these packages contain most of the utilities used in tidyverse, such as %>%, summarize, mutate. If you feel more comfortable with R, data.table is the best choice as it is much faster and is already multi-threaded, see more on the package here.

How to load packages in your script

In case you want to safeguard against missing packages, here are some scripts below to install missing packages before the rest of your code is run.

## If a package is installed, it will be loaded. If any
## are not, the missing package(s) will be installed
## from CRAN and then loaded.
## https://vbaliga.github.io/verify-that-r-packages-are-installed-and-loaded/

## First specify the packages of interest
packages = c("catlearn", "data.table",
             "psp", "doParallel")

## Now load or install&load all
package.check <- lapply(
  packages,
  FUN = function(x) {
    if (!require(x, character.only = TRUE)) {
      install.packages(x, dependencies = TRUE, ask = FALSE)
      library(x, character.only = TRUE)
    }
  }
)

I didn’t test the python version on the VM, but this seems to be at least a locally working solution:

try:
    import scipy
except ImportError:
    from pip._internal import main as pip
    pip(['install', '--user', 'scipy'])
    import scipy

Note that if you are using Python, you will set the path with --user.

Writing files to disk

There is a shared /DATA folder that can be mounted on the instance and accessed through a personal machine. You should have received information about where it is. I recommend you double check on the VM and use the absolute path when saving the simulation results.

This is a 200 GB storage, but it can be increased if needed.

You should save your output in this folder. The shared storage provides you with access to your results even after the VM has been shut down, deallocated, or deleted.

You can access your files via Azure Storage Explorer. Download it here. It can also be installed via snap:

# install software
sudo snap install storage-explorer

# allow it to use password manager services
snap connect storage-explorer:password-manager-service :password-manager-service

Executing Your Script

There are multiple ways to run your code - e.g. use screen, but a bash script is the best. It allows you to simply shut down the instance when finished - even when the simulations are halted due to an error. You don’t want to keep paying for an idle instance.

The bash script will be a file ending in .sh and beginning with #!/bin/sh. See the example below:

#!/bin/sh

# allow errors so the bash script keeps running even
# if your code resulted in an error
set +e

echo "LANG" $LANG

# run script

echo "Start of job at"
date

# run script and save command line outputs to log
Rscript my-unashamedly-parallelized-simulation.R | tee -a "log.$(date +"%m-%d-%y").out"

# or if you use a python script that is not executable as a program
python my-embarrassingly-cpu-heavy-job.py | tee -a "log.$(date +"%m-%d-%y").out"

echo "End of job at"
date

# gracefully shut down instance so you don't have to pay for surplus time
az vm deallocate -g resource-group -n vm-name

Note that I would recommend testing a dumbed-down version of your code and make sure it runs and saves what you need to disk. If the script encounters an error, it still executes the remaining code and shuts down the instance.

Executing your script on the Virtual Machine

After writing, testing and uploading your bash script, run the following command:

# make it executable
chmod u+x my-bash-script.sh
# run command in background with logging
nohup "./my-bash-script.sh" > output.log.out &
# or run command in background without logging
nohup "./my-bash-script.sh" &

I would generally recommend to use screen, so that you can return to your workspace. It will make your life easier, especially if you are monitoring what you are doing or your code prints out info about the simulation. You don’t have to worry about it with xRDP.

Checking jobs

If you want to see how your script is running, you could use htop to check on the number of cores used or check on the output.log.out to see where things are.