Owner Nodes and Using Slurm Constraints
Slurm jobs submitted to guest partitions, using #SBATCH --account owner-guest
and #SBATCH --partition cluster-guest
(substituting the proper cluster name) are eligible for preemption by jobs submitted by the group that owns the nodes. To help minimize the chance of
preemption and avoid wasting time and other resources, nodes owned by groups with
(historically) low utilization can be targeted directly in batch scripts and interactive
submissions. It is important to note that any constraint suggestions are based solely
on historical usage information and are not indicative of the future behavior of any
groups.
Utilization of Owner Nodes by Cluster
Understanding owner utilization of their nodes can be helpful for users who are interested in shortening the wait time in the Slurm queue of their jobs by accessing the <cluster>-guest and <cluster>-gpu-guest partitions. The CHPC provides handy graphics that detail the use of owner node partitions by the owners.
Information about owner utilization is presented as a heatmap and is generated from Slurm logs. Lighter colors mean fewer nodes with the given node feature were in use by owner groups, which is beneficial for guest jobs, while darker colors mean more of the nodes were being used by the owners.
Please click on a link below to view utilization of owner nodes by the owners for each cluster:
If the images are not updating, it may be because your browser is caching older versions. Try an uncached reload of the page.
How to use Constraint Suggestions
Both the utilization of the owner ndoes and the size of the pool of nodes must be considered when selecting constraints; if an owner group has many nodes available and utilizes only some of them, the remainder will still be available for guest jobs. Selecting constraints for both effective size and owner utilization, then, can help further reduce the likelihood of preemption.
Constraints, specified with the line #SBATCH -C
or #SBATCH --constraint
, can be used to target specific nodes based on certain constraints a job may have,
such as memory, number of CPUs, and even the name of a node or owner node. The ability
to use constraints allows for a finer grained specification of resources.
Features commonly specified are:
- Core count on node: The core count is denoted as c#, e.g., c16. To request 16 core nodes, do
#SBATCH -C c16
. - Amount of memory per node: This constraint takes the form of m#, where the number is the amount, in GB, of
memory in the node, e.g., m32. To request nodes with exclusively 32GB of memory, do
#SBATCH -C m32
.- IMPORTANT: There is a difference between the use of the memory constraint,
#SBATCH -C m32
, versus the batch directive#SBATCH --mem=32000
. When using#SBATCH --mem=32000
, you specify the number as it appears in the MEMORY entry from thesi
command, which is in MB. This will result in the job only being eligible to run on a node with at least this amount of memory and restricts the job to being able to use only 32GB, regardless if the node has more memory than this value. When using#SBATCH -C m32
, the job will only run on a node with 32GB of memory, but the job will have access to all of the memory of the node.
- IMPORTANT: There is a difference between the use of the memory constraint,
- Node owner: This can be used as a constraint to target specific group nodes that have low owner
use as a guest in order to reduce chances of being preempted. For example, to target
nodes used by owner group "ucgd", we can do
#SBATCH -C "ucgd"
. Historical usage (the past 2 weeks) of different owner node groups can be found at CHPC's constraint suggestion page. - GPUs: For the GPU nodes, the specified features includes the GPU line, e.g., geforce or tesla , and the GPU type, e.g., a100, 3090, or t4. There is additional information about specifying the GPUs being requested for a job on CHPC's GPU and Accelerator page.
- Processor architecture: This is currently only on notchpeak and redwood. This is useful for jobs where you want to restrtict the processor architecture to be used on the job. Examples are bwk for Intel Broadwell, skl for Intel Skylake processors, csl for Intel Cascade Lake, icl for Intel Icelake, npl for AMD Naples, rom for AMD Rome, and mil for AMD Milan.
Multiple constraints can be specified at once with logical operators in Slurm directives. This allows for submission to nodes owned by one of several owner groups at a time (which might help reduce queue times and increase the number of nodes available) as well as the specification of exact core counts and available memory.
To select from multiple owner groups' nodes, use the "or" operator; a directive like
#SBATCH -C "group1|group2|group3"
will select from nodes in any of the constraints listed. By contrast, the "and" operator can be used to achieve
further specificity in requests. To request nodes owned by a group and with only some
amount of memory, for example, a directive like #SBATCH -C "group1&m256"
could be used. (This will only work where multiple node features are associated with
the nodes and the combination is valid. To view the available node features, the sinfo
aliases si
and si2
documented on the Slurm page are helpful.)
When using in Open OnDemand, enter only the constraint string into the Constraints
text entry, e.g. "group1|group2|group3"
.
CPU Microarchitecture Constraints
Due to the variety of CPU microarchitectures on some CHPC clusters, each node is identified
with a specific three letter constraint that specifies the node's CPU microarchitecture.
Use these constraints to restrict runs to certain CPU types. The most common restriction
is to use only Intel or only AMD nodes, since some codes don't work when CPUs of both
manufacturers are in a single job. For example, to use only AMD nodes, use #SBATCH -C "rom|mil|gen"
.
Notchpeak
-
skl
Intel Sky Lake microarchitecture (Xeon 51xx or 61xx) -
csl
Intel Cascade Lake microarchitecture (Xeon 52xx or 62xx) -
icl
Intel Ice Lake microarchitecture (Xeon 53xx or 63xx) srp
Intel Sapphire Rapids microarchitecture (Xeon 54xx or 64xx)-
npl
AMD Naples microarchitecture (Zen1, EPYC 7xx1 ) -
rom
AMD Rome microarchitecture (Zen2, EPYC 7xx2) -
mil
AMD Milan microarchitecture (Zen3, EPYC 7xx3) gen
AMD Genoa microarchitecture (Zen4, EPYC 9xx1)
More Information on Slurm
Looking for more information on running Slurm at the CHPC? Check out these pages. If you have a specific question, please don't hesitate to contact us at helpdesk@chpc.utah.edu.
Setting up a Slurm Batch Script or Interactive Job Session
Slurm Priority Scoring for Jobs
Accessing CHPC's Data Transfer Nodes (DTNs) through Slurm
Other Slurm Constraint Suggestions and Owner Node Utilization