Quick Start
This Quick Start Tutorial offers a concise introduction for newcomers to High Performance Computing, as well as a refresher for those seeking to revisit the fundamentals. It introduces key concepts, outlines the core philosophy of HPC, and helps you determine whether — and how — HPC can be applied to your research.
Overview
What is HPC?
HPC stands for High Performance Computing and is often used interchangeably with the term supercomputing. A supercomputer is generally understood to be a system capable of performing computations and processing information far more quickly than the devices we use in everyday life, such as laptops or mobile phones. Most supercomputers are built as clusters — collections of powerful computer servers interconnected by high-speed networks. Each server in a cluster is called a compute node. While a compute node is conceptually similar to a workstation, it is usually much more powerful. For instance, a typical laptop today may have 4–8 CPU cores and 8–16 GB of RAM, whereas standard compute nodes on our Shabyt and Irgetas cluster feature 64-192 CPU cores and 256-384 GB of RAM. Moreover, some nodes on Shabyt and Irgetas clusters are equipped with advanced GPU accelerators (Nvidia V100 and Nvidia H100). For certain workloads, each of these GPUs can outperform a server CPU by a factor of 5–10 (i.e. an order of magnitude) or even more, making them especially valuable for tasks involving large-scale number crunching.
One of the key differences between a supercomputer and a personal laptop or desktop is that an HPC cluster is a shared resource. This means that dozens of users may access the system at the same time. Each user connects from their own computer and submits jobs to run on one or more of the cluster’s compute nodes. Because of this shared model, coordination is essential. Without it, uncontrolled execution of tasks would quickly lead to inefficiency and system instability. To avoid this, virtually all supercomputers rely on job schedulers — specialized software that manages when and where tasks run, ensuring that resources are used effectively and not overcommitted. In addition, job schedulers apply predefined policies to balance workloads and enforce fairness. These policies prioritize jobs according to factors such as user group, project importance, or requested resources, helping to prevent bottlenecks and ensuring equitable access to the cluster’s computing power.
Role of the Job Scheduler
A job scheduler (also called a workload manager) is software that coordinates the execution of user jobs on an HPC system. At Nazarbayev University, all our HPC facilities use SLURM — a free, open-source scheduler for Linux and Unix-like systems. SLURM is one of the most widely adopted workload managers worldwide, powering many supercomputers and clusters in universities, research institutions, and industry. Users interact with SLURM by writing a batch script. This script specifies the compute resources required (such as CPUs, memory, GPUs, and wall time) and contains the commands needed to run the user’s code. Once submitted, the scheduler places the job in a queue and allocates resources as they become available. When the requested resources are assigned, SLURM executes the commands in the batch script and records the output in a text file, serving as the equivalent of screen output.
Benefits of HPC: Scaling Up and Automation
Supercomputers offer levels of parallel processing and data storage far beyond what a standard laptop or desktop can provide. This enables researchers to scale up their work — for example, by running simulations at higher resolution or modeling systems of greater size and complexity. In other cases, the benefit comes not from increasing model complexity but from simply being able to run many jobs simultaneously. Unlike personal machines, which are limited by the small number of CPU cores available, HPC clusters provide access to hundreds or even thousands of cores, allowing for far more parallel computations.
Another major advantage of HPC is automation. Users can schedule jobs in advance, and the system executes them without supervision. On a personal workstation, long analyses often require keeping a terminal session open, which is inconvenient and error-prone. In contrast, HPC uses batch scripts — prewritten sets of instructions that the scheduler runs once the required resources are available. This makes it possible to carry out jobs lasting several days (subject to time limits set by administrators), with all output automatically written to files so progress can be monitored. For very long workloads, HPC systems may also support checkpointing, a mechanism that saves the current state of a job so it can be resumed later if the run exceeds the allowed time limit or is interrupted.
Common Misconceptions
“If I move my code from a desktop computer to an HPC cluster, it will automatically run faster.”
It is a common misconception that simply transferring code from a laptop or desktop to a supercomputer will guarantee faster performance. In reality, that is not always the case. In fact, for some workloads, especially serial jobs, execution may even be slower. This is because the strength of a supercomputer does not lie in the clock speed of individual CPU cores (which are often comparable to or slower than those in personal computers). Instead, performance gains come from the scale of resources available: many more CPU cores per node, multiple interconnected nodes, large memory pools, and specialized accelerators such as GPUs. To benefit from these resources, code must typically be rebuilt or optimized to exploit parallelism. Parallelization allows tasks to be split across multiple CPU cores, threads, or processes, enabling a “divide-and-conquer” approach. However, this does not happen automatically. Parallel execution must be explicitly implemented in the software or configured in the job submission process. Since the exact method depends on the software, users should consult documentation or best practices for their specific application to ensure it can run efficiently on an HPC system.
“If I allocate more CPU cores to my job, my software will automatically use them and performance will scale up.”
Requesting a large number of CPU cores for a job does not guarantee faster performance. If the software has not been designed or configured to use multiple cores, those extra resources will simply sit idle — wasting both your allocation and valuable HPC capacity. The job scheduler’s role is only to reserve the resources you request; it does not make your code parallel. To benefit from multiple cores, the software itself must support parallel execution (through multithreading, multiprocessing, or other parallelization techniques), and you must run it with the correct configuration or command-line options. In short, before requesting many cores, make sure your application is capable of using them efficiently.
If I run my job on a node that has GPU(s), it will automatically use them and run faster
The power of GPU computing, just like the power of using multiple CPU cores, comes from parallelism. GPUs typically have many thousands of specialized cores. In order to take advantage of those the code must have the capability to use them (e.g. through a software stack for some specific GPU architecture, such as Nvidia CUDA). It is not automatic. Moreover, it is not automatic even if your code implements some sort of parallelism that can be used for multi-CPU-core or multi-CPU-node computing. The software stacks that facilitate parallel computing on a GPU are generally different.
All nodes on a supercomputer are the same
NU HPC facilities are equipped with different types of compute nodes. For example, the login node is available to all users by default upon login. It is designed for managing and editing files, compilation, interacting with the job scheduler. The login node is not designed to run production computations. In fact, running jobs that are too computationally intensive on the login node can severely impact performance for other users and system processes and it is prohibited by our policies. Instead, all heavy computations must be submitted to a job queue. Jobs are automatically distributed among compute nodes by the job scheduler. The compute nodes available on NU HPC facilities are not all the same. For example, on Shabyt there two types of compute nodes: those that are equipped with CPUs only and those that in addition to CPUs also have GPUs. Moreover, the CPU models in these two types of compute nodes are not the same. While the number of CPU cores is the same in all compute nodes of Shabyt, the GPU nodes feature CPUs with somewhat lower clockspeed. In order to separate jobs that are intended to be run on CPU and GPU nodes, the scheduler is configured to have two different partitions. It is responsibility of the users to submit their jobs to the correct SLURM partition. For example, if a job does not make any use of GPUs, it should be submitted to a CPU partition so that the expensive GPU do not stay idle and instead are used by those users who run software that is capable of taking advantage of GPUs. The information about the hardware configuration of the nodes is available on page Systems.
I can run my tasks interactively on a compute node (e.g. play with my Jupyter notebook)
In principle, interactive access and interactive execution can be realized on an HPC cluster. Sometimes it is enabled on sufficiently large systems by either dedicating a certain number of nodes to interactive work or preempting currently executed jobs. However, this approach is not quite consistent with the general philosophy of HPC, which aims to achieve highly efficient utilization of expensive computational resources. Indeed, if a user interacts with sections of a Jupyter notebook or types Matlab commands in a terminal one by one, it will result in a situation when a significant amount of time the CPU is idle. A better approach is when the user does all the code development, interactive manipulations, debugging, etc. on a workstation computer, makes sure everything works as intended, and then executes heavy production calculations as a batch job on a supercomputer.
If you absolutely do need to run something interactively on an HPC cluster there is a way to do it on our systems. Please look at section Software, which explains how this can be achieved with SLURM. However, be advised that this may require a long wait time.
I cannot install my own software
Well, it depends. You cannot install and use software that requires sudo privileges. You cannot install software system-wide. But you can build and install software in your own home directory, there is nothing that prohibits it. You can also create custom environments and install packages for languages like Python and R by using built-in package managers. You can use Easybuild (a specialized framework to automate the process of software installations) to build packages of your choice. You do need to keep in mind that any software installation inside your home directory will count against your disk quota.
The system-wide installation of software and creation of modules are generally taken care of by the HPC administrators. If you would like something to be installed as software available to all HPC users, you can make a request through the Helpdesk ticketing system.
I can use an HPC cluster for immediate real-time processing of data fed continuously from external sources
If you need to do heavy or not so heavy real-time processing of data that comes from external sources (for example, continuously running AI image recognition of 24/7 astronomical observations in order to trigger some action on your telescope within a second; collect, process in real time, and store important medical patient data coming from hospitals across the country; or process a large volume of bank transactions in real time) then you must deploy a dedicated server or use a suitable cloud service provider. In contrast, pretty much all compute jobs on an HPC cluster are meant to be executed in background, when and only when resources for that job become available. It is generally impossible to predict when exactly your computations start and complete as jobs are put in a queue. Scheduling jobs and their background execution is the feature that enables maximum possible utilization of the expensive HPC equipment (up to 100% utilization, provided that users stack a sufficient number of job requests). On the contrary, dedicated mission critical servers must always be available for one specific task and therefore must overprovision resources for it, which leads to their underutilization over extended periods of time.