Policies: Difference between revisions

From NU HPC Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(50 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Important Note:''' Software configurations on NU HPC facilities are updated on a continuous basis. Minor policy changes also occur regularly. Some of these changes might not be immediately reflected on this website. The limits on job execution and maximum storage allocations are subject to change based on decisions made by the NU HPC Committee and actual system utilization.
'''Important Note:''' Software configurations on NU HPC facilities are updated on a continuous basis. Minor policy changes also occur regularly. Some of these changes might not be immediately reflected on this website. The limits on job execution and maximum storage allocations are subject to change based on decisions made by the NU HPC Committee and actual system utilization.


== Acceptable Use ==
== Acceptable use ==
The HPC system is a unique resource for NU researchers and the community. It has special characteristics, such as a large amount of RAM and the capability for massive parallelism. Due to its uniqueness and expense, its use is supervised by the HPC team to ensure efficient and fair utilization.
The HPC system is a unique resource for NU researchers and the community. It has special characteristics, such as a large amount of RAM and the capability for massive parallelism. Due to its uniqueness and expense, its use is supervised by the HPC team to ensure efficient and fair utilization.


Line 11: Line 11:
Users’ home directories are physically stored on fast SSD arrays that have very high bandwidth and enterprise class endurance of the flash drives.   
Users’ home directories are physically stored on fast SSD arrays that have very high bandwidth and enterprise class endurance of the flash drives.   


In the case of Shabyt cluster, the main storage servers are connected to the system via Infiniband interfaces (100 Mbit/s). All compute nodes are also connected via Infiniband. This provides very high bandwidth for users both when they access their data from the login node and when running their jobs on compute nodes using SLURM.   
In the case of Irgetas and Shabyt cluster, the main storage servers are connected to the system via Infiniband. All compute nodes are also connected via Infiniband. This provides very high bandwidth for users both when they access their data from the login node and when running their jobs on compute nodes using SLURM.   


In Muon cluster the main SSD storage is in the login node with all SSD connected via fast u.2 interfaces. However, please keep in mind that Muon's compute nodes have limited bandwidth with the login node (1 Mbit/s Ethernet). Therefore, batch jobs cannot read and write data faster than this.  
In Muon cluster the main SSD storage is in the login node with all SSD connected via fast u.2 interfaces. However, Muon's compute nodes have limited bandwidth with the login node (1 Mbit/s Ethernet). Therefore, batch jobs cannot read and write data faster than this network bandwidth.  
{| class="wikitable"
{| class="wikitable"
|+Default quota for users’ home directories on NU HPC systems
|+Default quota for users’ home directories on NU HPC systems
Line 20: Line 20:
!Default storage limit
!Default storage limit
|-
|-
|Shabyt cluster
|Irgetas
|<code>/home/<username></code>
|400 GB
|-
|Shabyt
|<code>/shared/home/<username></code>
|<code>/shared/home/<username></code>
|100 GB
|100 GB
|-
|-
|Muon cluster
|Muon
|<code>/shared/home/<username></code>
|<code>/shared/home/<username></code>
|250 GB
|250 GB
Line 31: Line 35:


=== Checking your storage quota ===
=== Checking your storage quota ===
One can use the following terminal commands to check your or your group member storage quota in home directory as well as to see how much of it is actually being used.
In Shabyt and Muon one can use the following terminal commands to check your or your group member storage quota in home directory as well as to see how much of it is actually being used.


<code>beegfs-ctl --getquota --uid $(id -u)</code>
<code>beegfs-ctl --getquota --uid $(id -u)</code>
Line 37: Line 41:
<code>beegfs-ctl --getquota --uid $(id -u <username>)</code>
<code>beegfs-ctl --getquota --uid $(id -u <username>)</code>


=== Additional storage - zdisk ===  
=== Additional storage - zdisk, datahub ===  


In Shabyt cluster, users can store larger amounts of data in their group directory on a slower HDD array. Keep in mind that this array does not have an Infiniband connection. Therefore, data access and transfer speeds from both the login node and compute nodes are limited to the standard 1 Mbit/s Ethernet speeds. In zdisk, each research group has a shared allocation. This can be particularly handy when the data needs to be transferred or shared within a single research group.
In Shabyt cluster, users can store larger amounts of data in their group directory on a slower HDD array. Keep in mind that this array is not connected with Infiniband. Therefore, data access and transfer speeds from both the login node and compute nodes are limited to the standard 1 Mbit/s Ethernet speeds. In <code>/zdisk</code>, each research group has a shared allocation. This can be particularly handy when the data needs to be transferred, exchanged, or shared within a research group. Similarly, in Irgetas cluster, there is a directory called shared <code>/datahub</code> where each research group has its allocation for additional storage on an external HDD array.
{| class="wikitable"
{| class="wikitable"
|+Default storage quota for zdisk on NU HPC systems
|+Default storage quota for zdisk on NU HPC systems
Line 46: Line 50:
!Default storage limit
!Default storage limit
|-
|-
|Shabyt cluster
|Irgetas
|<code>/datahub/<researchgroupname></code>
|1 TB
|-
|Shabyt
|<code>/zdisk/<researchgroupname></code>
|<code>/zdisk/<researchgroupname></code>
|1 TB
|1 TB
|-
|-
|Muon cluster
|Muon
|<code>/zdisk/<researchgroupname></code>
|<code>/zdisk/<researchgroupname></code>
|1 TB
|1 TB
|}
|}
Again, in exceptional cases users/group may be granted an increased quota. Such requests are reviewed on an individual basis upon receiving a ticket by the PI via [https://helpdesk.nu.edu.kz/support/catalog/items/272 NU Helpdesk].
Again, in exceptional cases individual users or groups may be granted an increased quota. Such requests are reviewed on an individual basis upon receiving a ticket by the PI via [https://helpdesk.nu.edu.kz/support/catalog/items/272 NU Helpdesk].
 


== Data integrity and backup ==
== Data Integrity and Backup ==
Please be advised that users take full responsibility for the integrity and safety of their data stored on NU HPC facilities. While our clusters feature enterprise level hardware, failures are still a possibility. We do backup data in user home directories automatically several times a week (note that this applies only to your home directory in /shared/home, not to the group storage allocations in <code>/zdisk</code>). However, if a major hardware failure takes place, even if your data is eventually restored, you may not have access to it for a prolonged period of time while the system is offline being repaired. In some unfortunate situations it might take many days or even weeks to get everything back. Moreover, no system or storage solution is 100% reliable. Therefore we highly recommend that you backup your data (at least the important and precious part of it) on your personal computer from time to time.  
Users are fully responsible for the integrity and safety of their data stored on NU HPC facilities. Although our clusters employ enterprise-grade hardware, failures remain possible. Home directories (<code>/shared/home</code>) are automatically backed up several times per week. Please note that this policy does not cover group storage allocations in <code>/zdisk</code> and <code>/datahub</code>.
In the event of a major hardware failure, access to your data may be unavailable for an extended period while the system is under repair. In some cases, full recovery may take days or even weeks. Furthermore, no storage system is 100% reliable.
For this reason, we strongly recommend that you maintain your own backups of important or irreplaceable data on your personal computer or other secure storage solutions. Regular personal backups will help ensure data safety and minimize disruption in case of unexpected system issues.


== Limits on the jobs and their execution time ==
== Partitions ==


=== Shabyt partitions and time limits ===
A partition in SLURM essentially means a queue: a logical grouping of compute nodes that share the same access rules and limits. Users submit jobs to a partition, and SLURM schedules them on nodes belonging to that partition.
Currently, Shabyt cluster has two available partitions for user jobs.  
On NU HPC systems partitions group compute nodes that have identical hardware.


* <code>CPU</code> : This partition includes 20 CPU-only nodes. Each node has two 32-core AMD EPYC CPUs
=== Irgetas ===
* <code>NVIDIA</code> : This partition consists of 4 GPU nodes. Each node has two 32-core AMD EPYC CPUs and two NVIDIA V100 GPUs. All jobs requiring GPU computations must be queued to this partition. While it is possible to run jobs that use CPUs only in this partition, users are discouraged from doing so to ensure efficient utilization of the system. Submitting CPU jobs to partition NVIDIA can only be justified if this partition sits idle for a long time, while the CPU partition is heavily crowded with many jobs waiting in the queue.
The Irgetas cluster has two available partitions for user jobs.
 
* <code>ZEN4</code> : This partition includes 10 CPU-only nodes. Each node has two 96-core AMD EPYC 9684X CPUs
* <code>H100</code> : This partition consists of 6 GPU nodes. Each node has two 96-core AMD EPYC 9454 CPUs and four Nvidia H100 GPUs. All Irgetas jobs requiring GPU computations must be queued to this partition. While it is possible to run jobs that use CPUs only in this partition, users are highly discouraged from doing so to ensure efficient utilization of the system. Submitting CPU jobs to partition H100 can only be justified if this partition sits idle for a very long time, while the ZEN4 partition is heavily crowded with many jobs waiting in the queue.
 
=== Shabyt ===
The Shabyt cluster has two available partitions for user jobs.
 
* <code>CPU</code> : This partition includes 20 CPU-only nodes. Each node has two 32-core AMD EPYC 7502 CPUs
* <code>NVIDIA</code> : This partition consists of 4 GPU nodes. Each node has two 32-core AMD EPYC 7452 CPUs and two NVIDIA V100 GPUs. All Shabyt jobs requiring GPU computations must be queued to this partition. While it is possible to run jobs that use CPUs only in this partition, users are discouraged from doing so to ensure efficient utilization of the system. Submitting CPU jobs to partition NVIDIA can only be justified if this partition sits idle for a very long time, while the CPU partition is heavily crowded with many jobs waiting in the queue.
 
=== Muon ===
The Muon cluster has a single partition.
 
* <code>HPE</code>. This includes all ten compute nodes each having a single 14-core Intel Xeon CPU.
 
== Quality of Service (QoS) ==
Users belonging to different university units and research groups have different limits on how many jobs they can run simultaneously. This is controlled by the Quality of Service (QoS) category in SLURM.
 
=== Irgetas ===
The Irgetas cluster has four active QoS categories
 
* <code>hpcnc</code> : Members of research groups that are part of the research cluster called High Performance Computing, Networking, and Cybersecurity (HPCNC), which procured Shabyt
* <code>nu</code> : All other NU researchers (default category)
* <code>issai</code> : Members of the Institute of Smart Systems and Artificial Intelligence
* <code>stud</code> : Students with temporary accounts who take courses related to HPC (e.g. PHYS 421/521/721)
 
=== Shabyt ===
The Shabyt cluster has three active QoS categories: 
 
* <code>hpcnc</code> : Members of research groups that are part of the research cluster called High Performance Computing, Networking, and Cybersecurity (HPCNC), which procured Shabyt
* <code>nu</code> : All other NU researchers (default category)
* <code>stud</code> : Students with temporary accounts who take courses related to HPC (e.g. PHYS 421/521/721)
 
=== Muon ===
The Muon cluster has two active QoS categories: 
 
* <code>hpcnc</code> : Members of research groups that are part of the research cluster called High Performance Computing, Networking, and Cybersecurity (HPCNC), which procured Shabyt
* <code>nu</code> : All other NU researchers (default category)
 
== Job time limits ==
 
The following table lists maximum allowed job durations (wall time) in different partitions of NU HPC systems, as well as key characteristics (RAM, number of cores, number of GPUs) for compute nodes in each partition.


{| class="wikitable"
{| class="wikitable"
|+Time limits for jobs in Shabyt partitions
|+Time limits for jobs in different partitions of NU HPC systems
!System
!Partition
!Partition
!Max job duration
!Max job duration
!Number of nodes available
!Number of nodes  
!Max CPU cores per node
available
!Max threads per node
!Max CPU cores  
!RAM (GB) per node
per node
!Max threads  
per node
!RAM
per node
!GPUs
per node
|-
|Irgetas
|<code>ZEN4</code>
|7 days (168 hours)
|10
|192
|384
|384 GB
|n/a
|-
|Irgetas
|<code>H100</code>
|4 days (96 hours)
|6
|192
|384
|768 GB
|4
|-
|-
|Shabyt
|<code>CPU</code>
|<code>CPU</code>
|14 days (336 hours)
|14 days (336 hours)
Line 81: Line 160:
|64
|64
|128
|128
|256
|256 GB
|n/a
|-
|-
|Shabyt
|<code>NVIDIA</code>
|<code>NVIDIA</code>
|2 days (48 hours)
|2 days (48 hours)
Line 88: Line 169:
|64
|64
|128
|128
|256
|256 GB
|2
|-
|Muon
|<code>HPE</code>
|14 days (336 hours)
|10
|14
|28
|64 GB
|n/a
|}
|}


=== Quality of Service (QoS) ===
== Limits on the number of jobs, cores, threads, and GPUs ==
In Shabyt, user belonging to different groups have different limits on how many jobs they can run simultaneously. This is controlled by the Quality of Service (QoS) category. Currently we have only two active QoS categories: 


* <code>hpcnc</code> : Members of research groups that are part of the research cluster called High Performance Computing, Networking, and Cybersecurity (HPCNC), which originally procured Shabyt
All limits on the number of simultaneously running jobs, CPU cores used, GPUs used, and job priorities are listed below for all clusters and QoS categories.
* <code>nu</code> : All other NU researchers


=== Shabyt limits on the number of jobs and cores/threads ===
{| class="wikitable"
{| class="wikitable"
|+Maximum number of simultaneously running jobs, CPU cores, and threads for Shabyt
|+Maximum number of simultaneously running jobs, CPU cores, and threads for NU HPC systems
!System
!QoS
!QoS
!Supported partition
!Partition
!Max jobs per user
!Max simultaneously
!Max CPU cores per user
running jobs
(total for all running jobs)
 
!Max threads per user
per user
(total for all running jobs)
!Max CPU cores
!Job launch priority (higher means it  
per user
moves up faster in the list of waiting jobs)
 
(total for all
 
running jobs)
!Max threads
per user
 
(total for all
 
running jobs)
!Max GPUs
per user
 
(total for all
 
running jobs)
!Job launch priority
(higher relative value
 
means it moves up faster
 
in the list of waiting jobs)
|-
|-
|Irgetas
|<code>hpcnc</code>
|<code>hpcnc</code>
|CPU, NVIDIA
|<code>ZEN4</code>
|12
|576
|1152
|n/a
|10
|-
|Irgetas
|<code>nu</code>
|<code>ZEN4</code>
|12
|576
|1152
|n/a
|10
|-
|Irgetas
|<code>issai</code>
|<code>ZEN4</code>
|12
|576
|1152
|n/a
|10
|-
|Irgetas
|<code>hpcnc</code>
|<code>H100</code>
|12
|576
|1152
|12
|10
|-
|Irgetas
|<code>nu</code>
|<code>H100</code>
|12
|576
|1152
|12
|10
|-
|Irgetas
|<code>issai</code>
|<code>H100</code>
|12
|1152
|2304
|24
|30
|-
|Shabyt
|<code>hpcnc</code>
|<code>CPU</code>, <code>NVIDIA</code>
|40
|40
|1280
|1280
|2560
|2560
|8
|10
|10
|-
|-
|Shabyt
|<code>nu</code>
|<code>nu</code>
|CPU, NVIDIA
|<code>CPU</code>, <code>NVIDIA</code>
|12
|12
|256
|256
|512
|512
|8
|5
|-
|Shabyt
|<code>stud</code>
|<code>CPU</code>, <code>NVIDIA</code>
|4
|128
|256
|4
|5
|5
|}
=== Muon partitions and time limits ===
Currently, there is only a single partition in Muon cluster that is called <code>HPE</code>. It includes all ten compute nodes with 14-core Intel Xeon CPUs in each. There are no limits on the number of simultaneously running jobs or CPU cores used in Muon.
{| class="wikitable"
|+Time limits for jobs in Muon partitions
!Partition
!Max job duration
!Number of nodes available
!Max CPU cores per node
!Max threads per node
!RAM (GB) per node
|-
|-
|Muon
|<code>hpcnc</code>,<code>nu</code>
|<code>HPE</code>
|<code>HPE</code>
|14 days (336 hours)
|40
|140
|280
|n/a
|10
|10
|14
|28
|64
|}
|}


== Acknowledgments in publications ==
== Acknowledgments in publications ==
If the computational resources provided by NU HPC facilities were an essential tool in your research that resulted in a publication, we ask that you include an acknowledgment in it. A natural place for it is the same section where you would typically acknowledge funding sources. Two of many possible formats of this acknowledgement are as follows:
If computational resources provided by Nazarbayev University Research Computing (NU RC) were essential to research reported in a publication, please include an acknowledgment — typically in the same section where funding sources are acknowledged. Example wordings (feel free to adapt), but ensure the exact phrase '''Nazarbayev University Research Computing''' appears:
 
* ''The authors acknowledge the use of computational resources provided by Nazarbayev University Research Computing.''
* ''The authors acknowledge the use of computational facilities provided by the Nazarbayev University Research Computing.''
* ''A.B. and C.D. acknowledge the use of the Irgetas HPC cluster at Nazarbayev University Research Computing.''
* ''A.B. and C. D. (author initials) acknowledge the use of Shabyt HPC cluster at Nazarbayev University Research Computing.''
Regardless of the format you adopt, please have the "''Nazarbayev University Research Computing"'' word expression in there.

Latest revision as of 00:23, 28 September 2025

Important Note: Software configurations on NU HPC facilities are updated on a continuous basis. Minor policy changes also occur regularly. Some of these changes might not be immediately reflected on this website. The limits on job execution and maximum storage allocations are subject to change based on decisions made by the NU HPC Committee and actual system utilization.

Acceptable use

The HPC system is a unique resource for NU researchers and the community. It has special characteristics, such as a large amount of RAM and the capability for massive parallelism. Due to its uniqueness and expense, its use is supervised by the HPC team to ensure efficient and fair utilization.

Users are accountable for their actions. It is responsibility of PIs to ensure that their group members have the necessary expertise to use NU HPC facilities properly and do it for research purposes only. Intentional misuse of NU HPC resources or noncompliance with our Acceptable Use Policy can lead to temporary or permanent disabling of accounts, and administrative or even legal actions.

Storage quotas

Home directory

Users’ home directories are physically stored on fast SSD arrays that have very high bandwidth and enterprise class endurance of the flash drives.

In the case of Irgetas and Shabyt cluster, the main storage servers are connected to the system via Infiniband. All compute nodes are also connected via Infiniband. This provides very high bandwidth for users both when they access their data from the login node and when running their jobs on compute nodes using SLURM.

In Muon cluster the main SSD storage is in the login node with all SSD connected via fast u.2 interfaces. However, Muon's compute nodes have limited bandwidth with the login node (1 Mbit/s Ethernet). Therefore, batch jobs cannot read and write data faster than this network bandwidth.

Default quota for users’ home directories on NU HPC systems
System Path Default storage limit
Irgetas /home/<username> 400 GB
Shabyt /shared/home/<username> 100 GB
Muon /shared/home/<username> 250 GB

In some exceptional cases users may be granted a higher storage quota in their home directories. An increased limit must be requested via Helpdesk's ticketing system. Such requests are reviewed on a individual basis and approved only in exceptional cases.

Checking your storage quota

In Shabyt and Muon one can use the following terminal commands to check your or your group member storage quota in home directory as well as to see how much of it is actually being used.

beegfs-ctl --getquota --uid $(id -u)

beegfs-ctl --getquota --uid $(id -u <username>)

Additional storage - zdisk, datahub

In Shabyt cluster, users can store larger amounts of data in their group directory on a slower HDD array. Keep in mind that this array is not connected with Infiniband. Therefore, data access and transfer speeds from both the login node and compute nodes are limited to the standard 1 Mbit/s Ethernet speeds. In /zdisk, each research group has a shared allocation. This can be particularly handy when the data needs to be transferred, exchanged, or shared within a research group. Similarly, in Irgetas cluster, there is a directory called shared /datahub where each research group has its allocation for additional storage on an external HDD array.

Default storage quota for zdisk on NU HPC systems
System Path Default storage limit
Irgetas /datahub/<researchgroupname> 1 TB
Shabyt /zdisk/<researchgroupname> 1 TB
Muon /zdisk/<researchgroupname> 1 TB

Again, in exceptional cases individual users or groups may be granted an increased quota. Such requests are reviewed on an individual basis upon receiving a ticket by the PI via NU Helpdesk.


Data Integrity and Backup

Users are fully responsible for the integrity and safety of their data stored on NU HPC facilities. Although our clusters employ enterprise-grade hardware, failures remain possible. Home directories (/shared/home) are automatically backed up several times per week. Please note that this policy does not cover group storage allocations in /zdisk and /datahub. In the event of a major hardware failure, access to your data may be unavailable for an extended period while the system is under repair. In some cases, full recovery may take days or even weeks. Furthermore, no storage system is 100% reliable. For this reason, we strongly recommend that you maintain your own backups of important or irreplaceable data on your personal computer or other secure storage solutions. Regular personal backups will help ensure data safety and minimize disruption in case of unexpected system issues.

Partitions

A partition in SLURM essentially means a queue: a logical grouping of compute nodes that share the same access rules and limits. Users submit jobs to a partition, and SLURM schedules them on nodes belonging to that partition. On NU HPC systems partitions group compute nodes that have identical hardware.

Irgetas

The Irgetas cluster has two available partitions for user jobs.

  • ZEN4 : This partition includes 10 CPU-only nodes. Each node has two 96-core AMD EPYC 9684X CPUs
  • H100 : This partition consists of 6 GPU nodes. Each node has two 96-core AMD EPYC 9454 CPUs and four Nvidia H100 GPUs. All Irgetas jobs requiring GPU computations must be queued to this partition. While it is possible to run jobs that use CPUs only in this partition, users are highly discouraged from doing so to ensure efficient utilization of the system. Submitting CPU jobs to partition H100 can only be justified if this partition sits idle for a very long time, while the ZEN4 partition is heavily crowded with many jobs waiting in the queue.

Shabyt

The Shabyt cluster has two available partitions for user jobs.

  • CPU : This partition includes 20 CPU-only nodes. Each node has two 32-core AMD EPYC 7502 CPUs
  • NVIDIA : This partition consists of 4 GPU nodes. Each node has two 32-core AMD EPYC 7452 CPUs and two NVIDIA V100 GPUs. All Shabyt jobs requiring GPU computations must be queued to this partition. While it is possible to run jobs that use CPUs only in this partition, users are discouraged from doing so to ensure efficient utilization of the system. Submitting CPU jobs to partition NVIDIA can only be justified if this partition sits idle for a very long time, while the CPU partition is heavily crowded with many jobs waiting in the queue.

Muon

The Muon cluster has a single partition.

  • HPE. This includes all ten compute nodes each having a single 14-core Intel Xeon CPU.

Quality of Service (QoS)

Users belonging to different university units and research groups have different limits on how many jobs they can run simultaneously. This is controlled by the Quality of Service (QoS) category in SLURM.

Irgetas

The Irgetas cluster has four active QoS categories

  • hpcnc : Members of research groups that are part of the research cluster called High Performance Computing, Networking, and Cybersecurity (HPCNC), which procured Shabyt
  • nu : All other NU researchers (default category)
  • issai : Members of the Institute of Smart Systems and Artificial Intelligence
  • stud : Students with temporary accounts who take courses related to HPC (e.g. PHYS 421/521/721)

Shabyt

The Shabyt cluster has three active QoS categories:

  • hpcnc : Members of research groups that are part of the research cluster called High Performance Computing, Networking, and Cybersecurity (HPCNC), which procured Shabyt
  • nu : All other NU researchers (default category)
  • stud : Students with temporary accounts who take courses related to HPC (e.g. PHYS 421/521/721)

Muon

The Muon cluster has two active QoS categories:

  • hpcnc : Members of research groups that are part of the research cluster called High Performance Computing, Networking, and Cybersecurity (HPCNC), which procured Shabyt
  • nu : All other NU researchers (default category)

Job time limits

The following table lists maximum allowed job durations (wall time) in different partitions of NU HPC systems, as well as key characteristics (RAM, number of cores, number of GPUs) for compute nodes in each partition.

Time limits for jobs in different partitions of NU HPC systems
System Partition Max job duration Number of nodes

available

Max CPU cores

per node

Max threads

per node

RAM

per node

GPUs

per node

Irgetas ZEN4 7 days (168 hours) 10 192 384 384 GB n/a
Irgetas H100 4 days (96 hours) 6 192 384 768 GB 4
Shabyt CPU 14 days (336 hours) 20 64 128 256 GB n/a
Shabyt NVIDIA 2 days (48 hours) 4 64 128 256 GB 2
Muon HPE 14 days (336 hours) 10 14 28 64 GB n/a

Limits on the number of jobs, cores, threads, and GPUs

All limits on the number of simultaneously running jobs, CPU cores used, GPUs used, and job priorities are listed below for all clusters and QoS categories.

Maximum number of simultaneously running jobs, CPU cores, and threads for NU HPC systems
System QoS Partition Max simultaneously

running jobs

per user

Max CPU cores

per user

(total for all

running jobs)

Max threads

per user

(total for all

running jobs)

Max GPUs

per user

(total for all

running jobs)

Job launch priority

(higher relative value

means it moves up faster

in the list of waiting jobs)

Irgetas hpcnc ZEN4 12 576 1152 n/a 10
Irgetas nu ZEN4 12 576 1152 n/a 10
Irgetas issai ZEN4 12 576 1152 n/a 10
Irgetas hpcnc H100 12 576 1152 12 10
Irgetas nu H100 12 576 1152 12 10
Irgetas issai H100 12 1152 2304 24 30
Shabyt hpcnc CPU, NVIDIA 40 1280 2560 8 10
Shabyt nu CPU, NVIDIA 12 256 512 8 5
Shabyt stud CPU, NVIDIA 4 128 256 4 5
Muon hpcnc,nu HPE 40 140 280 n/a 10


Acknowledgments in publications

If computational resources provided by Nazarbayev University Research Computing (NU RC) were essential to research reported in a publication, please include an acknowledgment — typically in the same section where funding sources are acknowledged. Example wordings (feel free to adapt), but ensure the exact phrase Nazarbayev University Research Computing appears:

  • The authors acknowledge the use of computational resources provided by Nazarbayev University Research Computing.
  • A.B. and C.D. acknowledge the use of the Irgetas HPC cluster at Nazarbayev University Research Computing.