Quick Start: Difference between revisions

From NU HPC Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 22: Line 22:
You might naively assume that if you just move your code from a laptop/desktop computer to a supercomputer, it will automatically run faster. That is not always the case. In fact, you might be surprised to learn that sometimes it can run even ''slower''. It is particularly true for serial job. This is because the power of a supercomputer comes not from the clockspeed of a single CPU core (which are typically not very high) but from the sheer volume of resources available - many more CPU cores in each node, multiple nodes that can be used, larger amount of memory, availability of GPUs, etc. Most often performance boosts come from rebuilding or optimizing your code to take advantage of the parallelism, i.e. additional CPU cores available on HPC. Parallelization enables jobs to divide-and-conquer independent tasks when multiple threads or parallel processes are executed. However, on the HPC, parallelization must almost always be explicitly coded or configured and called from your job. It is not automatic. This process is highly software-dependent, so you will want to research the proper method for running your program of choice in parallel.
You might naively assume that if you just move your code from a laptop/desktop computer to a supercomputer, it will automatically run faster. That is not always the case. In fact, you might be surprised to learn that sometimes it can run even ''slower''. It is particularly true for serial job. This is because the power of a supercomputer comes not from the clockspeed of a single CPU core (which are typically not very high) but from the sheer volume of resources available - many more CPU cores in each node, multiple nodes that can be used, larger amount of memory, availability of GPUs, etc. Most often performance boosts come from rebuilding or optimizing your code to take advantage of the parallelism, i.e. additional CPU cores available on HPC. Parallelization enables jobs to divide-and-conquer independent tasks when multiple threads or parallel processes are executed. However, on the HPC, parallelization must almost always be explicitly coded or configured and called from your job. It is not automatic. This process is highly software-dependent, so you will want to research the proper method for running your program of choice in parallel.


==== If I allocate more CPU cores to my job, my software uses them and the performance scales up accordingly ====
==== If I allocate more CPU cores to my job, my software will use them and the performance will scale up accordingly ====
Running a job with a large number of CPU cores when the software has not been configured to use them is a waste of your allocation, your time, and precious HPC resources. Software must be designed to use multiple CPU cores as part of its execution. You will need to ensure your software has the capability to make use of multiple CPU cores. The job scheduler only allocates the resources you requested, but it is your responsibility to ensure that the code itself can use them as intended and take advantage of parallelism.
Running a job with a large number of CPU cores when the software has not been configured to use them is a waste of your allocation, your time, and precious HPC resources. Software must be designed to use multiple CPU cores as part of its execution. You will need to ensure your software has the capability to make use of multiple CPU cores. The job scheduler only allocates the resources you requested, but it is your responsibility to ensure that the code itself can use them as intended and take advantage of parallelism.
==== If I run my job on a node that has GPU(s), it will automatically use them and run faster ====
The power of GPU computing, just like the power of using multiple CPU cores, comes from parallelism. GPUs typically have many thousands of specialized cores. In order to take advantage of those the code must have the capability to use them (e.g. through a software stack for some specific GPU architecture, such as Nvidia CUDA). It is not automatic.
==== All nodes on a supercomputer are the same ====
NU HPC facilities are equipped with different types of compute nodes. For example, the login node is available to all users by default upon login. It is designed for managing and editing files, compilation, interacting with the job scheduler. The login node is ''not'' designed to run production computations. In fact, running jobs that are too computationally intensive on the login node can severely impact performance for other users and is prohibited by our policies. Heavy computations must be submitted to a job queue. Such jobs will be noticed and stopped by the HPC systems team.
Types of nodes on the UArizona HPC system include the Bastion Host, the Login Node, the Compute Nodes, and the Data Transfer Node. See Compute Resources for information on the compute hardware available.

Revision as of 01:40, 29 June 2024

This Quick Start Tutorial is meant to provide a very short introduction for those who are new to High Performance Computing, or simply wish to take a refresher of the basics. It covers some concepts that are general to HPC, explains its basic philosophy, and should let you decide whether and how you can deploy it in your research.

Overview

What is HPC?

HPC stands for High Performance Computing and is synonymous with the more colloquial term Supercomputer. In turn, a Supercomputer is a somewhat loosely defined umbrella term that means a computer that is capable to perform computations and other information processing tasks much more quickly than a typical computing device we use in everyday life (e.g. a laptop or mobile phone). Typically supercomputers are assembled as clusters, or collections of powerful computer servers interconnected with fast network connections. Each server in a cluster is often referred to as a Compute Node. Each of the servers or nodes is essentially a workstation though typically much more capable. For example, a standard laptop these days might have a CPU with 4-8 cores and 8-16 GB of RAM. Compare this with a standard compute node on Shabyt cluster, which has a whopping 64 CPU cores and 256 GB of RAM. In addition, some of the compute nodes on Shabyt feature powerful GPU accelerators (Nvidia V100), which in certain tasks may perform number crunching at speeds that exceed that of CPUs by a factor of 5x-10x or even more.

HPC cluster is a shared resource

Another main difference between a supercomputer and a personal laptop or desktop is that the supercomputer is a Shared Resource. This means there may be tens or even hundreds of users who simultaneously access the supercomputer. Each of them can connect to the HPC cluster from their own personal computer and run (or schedule) jobs on one or more of the cluster's compute nodes. You can probably guess that this shared resource model requires some form of coordination. Otherwise, a chaotic execution of computational tasks may lead to serious inefficiency and logistical disasters. This is why pretty much all supercomputers use Job Schedulers - software that controls execution of tasks and makes sure the system is not overcommitted at any given time. Job Schedulers may also handle different users and tasks according to predefined priority policy thereby preventing unintended or unfair share of precious computer resources.

Role of the job scheduler

A job scheduler (also known as workload manager) is software used to manage execution of user jobs. On all our HPC facilities at NU, we deployed a scheduler called SLURM - a free and open-source job scheduler for Linux and Unix-like systems, used in many, if not most, supercomputers and computer clusters found in universities, research institutions, and commercial companies across the world. Users can invoke SLURM by writing a Batch Script that requests certain amount of compute resources (e.g., CPUs, RAM, GPUs, compute time) and includes instructions for running your code. Users submit their scripts to the job scheduler, which then goes and finds available resources on the supercomputer for each user's job. When the resources needed for each specific job become available, it initiates the commands included in the batch script, and outputs the results to a text file (which is sort of an equivalent to the screen output).

Benefits of HPC - scaling up and automation

Supercomputers provide opportunities for parallel processing and data storage that greatly surpass what is capable in a standard laptop or desktop computer. This gives the ability to scale up simulations (e.g. use higher resolution or increase the size/complexity of the model). Other types of analyses may benefit not from increased complexity of the models but from the mere fact that one can execute more jobs at the same time. Common laptop/desktop machines are limited by the relatively small number of CPU cores accessible to them, decreasing the number of simultaneous computations as compared to an HPC.

Another benefit of using HPC is automation. Automation is a feature of HPC systems that allows users to schedule jobs ahead of time. These jobs are then run without supervision. Managing a workstation or keeping an SSH terminal active while scripts are running can lead to many inconveniences and complications when running extended analyses. Contrary to that, batch scripts allow a prewritten set of instructions to be executed when the scheduler determines that sufficient resources are available. This allows for jobs with extended completion times to be run for many days (the actual time limit is imposed by the policy set by the administrator). Meanwhile, the real-time output is saved to a file, allowing the user to check the progress of the job. Lastly, the user can set Checkpointing if the job requires execution longer than 10 days.

Common misconceptions

If I move my code/software from a desktop computer to HPC cluster, it will automatically run faster

You might naively assume that if you just move your code from a laptop/desktop computer to a supercomputer, it will automatically run faster. That is not always the case. In fact, you might be surprised to learn that sometimes it can run even slower. It is particularly true for serial job. This is because the power of a supercomputer comes not from the clockspeed of a single CPU core (which are typically not very high) but from the sheer volume of resources available - many more CPU cores in each node, multiple nodes that can be used, larger amount of memory, availability of GPUs, etc. Most often performance boosts come from rebuilding or optimizing your code to take advantage of the parallelism, i.e. additional CPU cores available on HPC. Parallelization enables jobs to divide-and-conquer independent tasks when multiple threads or parallel processes are executed. However, on the HPC, parallelization must almost always be explicitly coded or configured and called from your job. It is not automatic. This process is highly software-dependent, so you will want to research the proper method for running your program of choice in parallel.

If I allocate more CPU cores to my job, my software will use them and the performance will scale up accordingly

Running a job with a large number of CPU cores when the software has not been configured to use them is a waste of your allocation, your time, and precious HPC resources. Software must be designed to use multiple CPU cores as part of its execution. You will need to ensure your software has the capability to make use of multiple CPU cores. The job scheduler only allocates the resources you requested, but it is your responsibility to ensure that the code itself can use them as intended and take advantage of parallelism.

If I run my job on a node that has GPU(s), it will automatically use them and run faster

The power of GPU computing, just like the power of using multiple CPU cores, comes from parallelism. GPUs typically have many thousands of specialized cores. In order to take advantage of those the code must have the capability to use them (e.g. through a software stack for some specific GPU architecture, such as Nvidia CUDA). It is not automatic.

All nodes on a supercomputer are the same

NU HPC facilities are equipped with different types of compute nodes. For example, the login node is available to all users by default upon login. It is designed for managing and editing files, compilation, interacting with the job scheduler. The login node is not designed to run production computations. In fact, running jobs that are too computationally intensive on the login node can severely impact performance for other users and is prohibited by our policies. Heavy computations must be submitted to a job queue. Such jobs will be noticed and stopped by the HPC systems team.

Types of nodes on the UArizona HPC system include the Bastion Host, the Login Node, the Compute Nodes, and the Data Transfer Node. See Compute Resources for information on the compute hardware available.