Policies

From NU HPC Wiki
Revision as of 11:09, 17 September 2024 by Admin (talk | contribs)
Jump to navigation Jump to search

Important Note: Software configurations on NU HPC facilities are updated on a continuous basis. Minor policy changes also occur regularly. Some of these changes might not be immediately reflected on this website. The limits on job execution and maximum storage allocations are subject to change based on decisions made by the NU HPC Committee and actual system utilization.

Acceptable Use

The HPC system is a unique resource for NU researchers and the community. It has special characteristics, such as a large amount of RAM and the capability for massive parallelism. Due to its uniqueness and expense, its use is supervised by the HPC team to ensure efficient and fair utilization.

Storage quotas

Home directory

Users’ home directories are physically stored on fast SSD arrays that have very high bandwidth and endurance.

In the case of Shabyt cluster, the main storage servers are connected to the system via Infiniband interfaces (100 Mbit/s). All compute nodes are also connected via Infiniband. This provides very high bandwidth for users both when they access their data from the login node and when running their jobs on compute nodes using SLURM.

In Muon cluster the main SSD storage is in the login node with all SSD connected via fast u.2 interfaces. However, keep in mind that Muon's compute nodes have limited bandwidth with the login node (1 Mbit/s Ethernet). Therefore, batch jobs cannot read and write data faster than this.

Current default quota for users’ home directories on NU HPC systems
System Path Default storage limit
Shabyt cluster /shared/home/<username> 100 GB
Muon cluster /shared/home/<username> 100 GB

Additional storage - zdisk

If any individual user requires more storage for his/her work, it can be allocated through a special request to the HPC admins. For particularly large, multi-terabyte storage needs Shabyt has an HDD array with the total capacity of 120 TB.

Data integrity and backup

Please be advised that users take full responsibility for the integrity and safety of their data stored on NU HPC facilities. While our clusters feature enterprise level hardware, failures are still a possibility. We do backup data in user home directories automatically several times a week (note that this applies only to your home directory in /shared/home, not to the group storage allocations in /zdisk). However, if a major hardware failure takes place, even if your data is eventually restored, you may not have access to it for a prolonged period of time while the system is being repaired. In some unfortunate situations it might take many days or even weeks to get everything back. Moreover, no system or storage solution is 100% reliable. Therefore we highly recommend that you backup most important and precious data on your personal computer from time to time.

Queues and the number of jobs

Currently, Shabyt has two partitions for user jobs. While at this time, when the system is still being configured and fine-tuned, there is no hardcoded limit on the number of jobs by any individual user, it will likely change in the near future.

Acknowledgments

If the computational resources provided by NU HPC facilities were an essential tool in your research that resulted in a publication, we ask that you include an acknowledgment in it. A natural place for it is the same section where you would typically acknowledge funding sources. Two of many possible formats of this acknowledgement are as follows:

The authors acknowledge the use of computational facilities provided by the Nazarbayev University Research Computing.

A.B. and C. D. (author initials) acknowledge the use of Shabyt HPC cluster at Nazarbayev University Research Computing.