Quick Start: Difference between revisions
No edit summary |
No edit summary |
||
Line 5: | Line 5: | ||
==== HPC cluster is a shared resource ==== | ==== HPC cluster is a shared resource ==== | ||
Another main difference between a supercomputer and a personal laptop or desktop is that the supercomputer is a S''hared Resource''. This means there may be tens or even hundreds of users who simultaneously access the supercomputer. Each of them can connect to the HPC cluster from their own personal computer and run (or schedule) | Another main difference between a supercomputer and a personal laptop or desktop is that the supercomputer is a S''hared Resource''. This means there may be tens or even hundreds of users who simultaneously access the supercomputer. Each of them can connect to the HPC cluster from their own personal computer and run (or schedule) jobs on one or more of the cluster's compute nodes. You can probably guess that this shared resource model requires some form of coordination. Otherwise, a chaotic execution of computational tasks may lead to serious inefficiency and logistical disasters. This is why pretty much all supercomputers use ''Job Schedulers'' - a software that controls execution of tasks and makes sure the system is not overcommitted at any given time. Job Schedulers may also handle different users and tasks according to predefined priority policy thereby preventing unintended or unfair share of precious computer resources. |
Revision as of 20:48, 28 June 2024
This Quick Start Tutorial is meant to provide a very short introduction for those who are new to High Performance Computing, or simply wish to take a refresher of the basics. It covers some concepts that are general to HPC, explains its basic philosophy, and should let you decide whether and how you can deploy it in your research.
What is HPC?
HPC stands for High Performance Computing and is synonymous with the more colloquial term Supercomputer. In turn, a Supercomputer is a somewhat loosely defined umbrella term that means a computer that is capable to perform computations and other information processing tasks much more quickly than a typical computing device we use in everyday life (e.g. a laptop or mobile phone). Typically supercomputers are assembled as clusters, or collections of powerful computer servers interconnected with fast network connections. Each server in a cluster is often referred to as a Compute Node. Each of the servers or nodes is essentially a workstation though typically much more capable. For example, a standard laptop these days might have a CPU with 4-8 cores and 8-16 GB of RAM. Compare this with a standard compute node on Shabyt cluster, which has a whopping 64 CPU cores and 256 GB of RAM. In addition, some of the compute nodes on Shabyt feature powerful GPU accelerators (Nvidia V100), which in certain tasks may perform number crunching at speeds that exceed that of CPUs by a factor of 5x-10x or even more.
Another main difference between a supercomputer and a personal laptop or desktop is that the supercomputer is a Shared Resource. This means there may be tens or even hundreds of users who simultaneously access the supercomputer. Each of them can connect to the HPC cluster from their own personal computer and run (or schedule) jobs on one or more of the cluster's compute nodes. You can probably guess that this shared resource model requires some form of coordination. Otherwise, a chaotic execution of computational tasks may lead to serious inefficiency and logistical disasters. This is why pretty much all supercomputers use Job Schedulers - a software that controls execution of tasks and makes sure the system is not overcommitted at any given time. Job Schedulers may also handle different users and tasks according to predefined priority policy thereby preventing unintended or unfair share of precious computer resources.