General Cluster Information
Filesystems
NFS home directory
Your home directory, which is an NFS mounted file system, is one choice for I/O. This space carries the worst statistical performance in terms of I/O speed; in addition the use of home directory for job I/O has the potential to impact all users of CHPC general resources. This space is visible to all nodes on the general clusters through an auto-mounting system.
NFS group directory
A second choice for I/O is the CHPC group space IF your group owns a group space. Group spaces are shares of space on a larger group file system (CHPC purchases file systems as needed and sells shares in TB chunks). This is also an NFS mounted file system, and therefore also has performance limitations. The use of your group space has the potential to impact all users from the groups that own a share of a given file system. This space is visible to all nodes on the general environment clusters through an auto-mounting system.
Scratch
All general environment cluster nodes have access to several NFS mounted scratch file
systems, including /scratch/general/lustre
with a 700 TB capacity, /scratch/general/nfs1
which has 595 TB, and /scratch/general/vast
with 1 PB.
Local disk (/scratch/local
)
The local scratch space is a storage space unique to each individual node. The local
scratch space is cleaned aggressively, with files older than 1 week being scrubbed.
It can be accessed on each node through/scratch/local
. This space will be one of the fastest, but certainly not the largest with the amount
available varying between the clusters. Users must remove all their files from /scratch/local
at the end of their calculation.
It is a good idea to make flows from one storage system to another when you are running jobs. At the start of the batch, job data files should be copied from the home directory to the scratch space, followed by a copy of the output back to the user's home directory at the end of the run.
It is important to keep in mind that ALL users must remove excess files on their own. Preferably this should be done within the user's batch job when he/she has finished the computation. Leaving files in any scratch space creates an impediment to other users who are trying to run their own jobs. Simply delete all extra files from any space, other than your home directory, when it is not being used immediately.
User Environment
CHPC currently supports two shells: bash and tcshrc. In addition, we use the Lmod module system to control the user environment.
CHPC provides two login files for each shell choice, the first, .tcshrc
/.bashrc
, which sets up a basic environment and the second,.custom.csh
/.custom.sh
, which a user can use to customize their environment. Details can be found on the
modules documentation page. All new accounts are created using the modules framework, and all accounts, even
those with the older style login files, have been enabled to make use of modules.
Applications
CHPC maintains a number of user applications as well as tools needed to install applications. For some of these we have software documentation pages which can be found in the software documentation section. Also note that there are several packages which we have installed, e.g., abaqus, ansys, charmm, comsol, star-CCM+, that are licensed by individual groups and therefore are not accessible outside of that group.
Historically, applications that are not cluster specific have been installed in /uufs/chpc.utah.edu/sys/pkg
, whereas cluster specific applications (most typically due to the use of a cluster
specific installation of MPI or another support package) are located in /uufs/$UUFSCELL/sys/pkg
, where $UUFSCELL is kingspeak.peaks, ash.peaks, lonepeak.peaks, or notchpeak.peaks.
Moving forward, we are working to consolidate applications in /uufs/chpc.utah.edu/sys/installdir
.
Batch System
The batch implementation on the CHPC clusters is Slurm.
Any process which requires more than 15 minutes run time needs to be submitted through
the batch system. The command to submit to the batch system is sbatch
, and there is also a way to submit an interactive job:
salloc -t 1:00:00 -n 4 -N 2
Other options for running jobs are given in the Slurm documentation.
Walltime
Each cluster has a hard walltime limit on general resources as well as jobs run as guest on owner resources. On most clusters this is 72 hours, however please see the individual cluster guides for specifics on a given cluster. If you find you need longer than this, please contact CHPC.
With respect to how much wall time to ask for, we suggest that you start with some small runs to gauge how long the production runs will take. As a rule of thumb, consider a wall time which is 10-15% larger than the actual run time. If you specify a shorter wall time, you face the risk of your job running out of wall time and killed before finishing. If you set a wall time that is too large, you may face a longer waiting time in the queue.
Partitions and accounts
Most clusters have general nodes, available to all University affiliates with an allocation, and owner nodes, owned by research groups and available only to the members of those research groups. Both the general and owner nodes are accessible to everyone, even those without or out of allocation, in a "freecycle" or "guest" modes, respectively. A "freecycle" job runs in the general (CHPC owned) partition, whereas "guest" jobs run on owner (research groups owned) partitions. Both "freecycle" and "guest" jobs are preemptable, i.e., they are subject to termination if a job with allocation needs the resources.
The GPU resources are handled separately. The general GPU nodes are run without allocation, but you must request to be added to the appropriate accounts. There are also owner GPU nodes which all users can use in guest mode, again with jobs subject to preemption. As there are not many GPU nodes, CHPC requests that only jobs that are making use of the GPUs be run on these resources.
In the table below we list options for resources available to a user based on their allocation and access to owner nodes. Note that although freecycle is listed as one of the options for groups without a general allocation, unless you can automatically restart your calculation, we don't recommend its use since the chances of preemption are very high. Also freecycle is not available if a group has an active allocation. To find out if, and how much general allocation a group has, see CHPC's usage page.
When using guest mode on owner nodes, you can utilize the slurm features descriptors and constraint directive described on the CHPC Slurm page along with information on past usage of owner nodes to target nodes that have been less heavily used in the recent past. If you do so, please remember that past utilization is not necessarily indicative of future usage patterns.
Allocations and node ownership status | What resource(s) are available |
No general allocation, no owner nodes | Unallocated general nodes Allocated general nodes in freecycle mode - not recommended Guest access on owner nodes |
General allocation, no owner nodes | Unallocated general nodes Allocated general nodes Guest access on owner nodes |
Group owner nodes, no general allocation | Unallocated general nodes Allocated general nodes in freecycle mode - not recommended Group owned nodes Guest access on owner nodes of other groups |
Group owner node, general allocation | Unallocated general nodes Allocated general nodes Group owned nodes Guest access on owner nodes of other groups |
The table below lists possible ways to run on CHPC clusters. General account name
is the group name, obtained with running groups $USER
command (typically PIs last name). The account and partition for owner nodes is typically
the group name, although there are exceptions. Available accounts and partitions can
be obtained by running sacctmgr -ps list user $USER
command. The partitions are not directly listed in the output of this command, but
QOS's (Quality of Service) are listed and the name of the partition for a given QOS
typically has the same name as the QOS. A new alternative to obtain your partition and account information in an easy to read
form is to issue the myallocation
command.
For details on how to target the different partitions and accounts, see the SLURM user's guide.
Execution mode | Cluster | Partition (-p) | Account (-A) |
lonepeak kingspeak |
group-name |
||
General nodes on allocated clusters - |
notchpeak granite |
group-name |
|
General GPU nodes |
kingspeak-gpu |
kingspeak-gpu |
|
Freecycle - can only be used if your |
notchpeak | notchpeak-freecycle | group-name |
Owner nodes (non-GPU) | kingspeak notchpeak lonepeak |
name-kp name-np name-lp |
name-kp name-np name-lp |
Owner GPU nodes | kingspeak notchpeak |
name-gpu-kp name-gpu-np |
name-gpu-kp name-gpu-np |
Guest access to owner nodes (non-GPU) | kingspeak notchpeak lonepeak |
kingspeak-guest notchpeak-guest lonepeak-guest |
owner-guest owner-guest owner-guest |
Guest accesss to owner GPU nodes |
kingspeak-gpu-guest |
owner-gpu-guest owner-gpu-guest |
In the following individual cluster user guides, details of a typical batch script are given.
Nodes with different core counts are present on most clusters. The number of cores a job is to use can be specified with Slurm constraints.
Individual Cluster User Guides
- Granite User Guide
- Notchpeak User Guide
- Kingspeak User Guide (no allocation required)
- Lonepeak User Guide (no allocation required)
- Redwood User Guide (Restricted use: See New Protected Environment)