Node Sharing Live on All General Environment Clusters
Posted: July 1, 2019
We have enabled the use of node sharing on all CHPC compute nodes in the general environment (all compute nodes on notchpeak, kingspeak, ember, lonepeak, tangent and ash). Previous to this change, node sharing was an option for select owner node partitions (upon request of that group), as wekk as already being in place as the only option on both the gpu nodes and the two AMD notchpeak nodes (notch081 and notch082).
Node sharing provides the option for a user to submit a job that uses only a portion of a node, which is useful if your application cannot make use of all of the processors of a node. Note when running a job In the shared mode, you MUST specify the number of tasks as well as the amount of memory your job requires. Documentation on slurm, node sharing, and methods to run multiple jobs inside a single batch script can be found at:
https://www.chpc.utah.edu/documentation/software/slurm.php
https://www.chpc.utah.edu/documentation/software/node-sharing.php
https://www.chpc.utah.edu/documentation/software/serial-jobs.php
There is also a summary of node sharing in the Summer 2019 newsletter, available at:
https://www.chpc.utah.edu/news/newsletters/summer2019_newsletter.pdf
When doing a sinfo or si (if you have the alias for this provided on the CHPC slurm documentation page) you will now see each set of nodes listed multiple time -- for general nodes on an a cluster under allocation this will be the cluster, cluster-shared, cluster-freecycle, and cluster-shared-freecycle; if not under allocation the freecycle options are not there. In addition, there is a new node state “mix” which refers to a node partially allocated, whereas nodes that are completely allocated show the state “alloc”, regardless if the node is running a single job or multiple jobs via node sharing. An example of this is (only the first owner partition is shown):
]$ sinfo
PARTITION AVAIL TIMELIMIT NODES
STATE NODELIST
Kingspeak* up 3-00:00:00
48 alloc kp[001-032,110-111,158-167,196-199]
kingspeak-shared up 3-00:00:00
48 alloc kp[001-032,110-111,158-167,196-199]
kingspeak-gpu up 3-00:00:00
4 mix kp[297-300]
kingspeak-freecycle up 3-00:00:00
4 mix kp[297-300]
kingspeak-freecycle up 3-00:00:00
48 alloc kp[001-032,110-111,158-167,196-199]
kingspeak-shared-freecycle up 3-00:00:00 4
mix kp[297-300]
kingspeak-shared-freecycle up 3-00:00:00 48
alloc kp[001-032,110-111,158-167,196-199]
kingspeak-guest up 3-00:00:00
1 drain$ kp145
kingspeak-guest up 3-00:00:00
3 down$ kp[144,146-147]
kingspeak-guest up 3-00:00:00
1 comp kp334
kingspeak-guest up 3-00:00:00
1 mix kp257
kingspeak-guest up 3-00:00:00
193 alloc kp[033-035,037-099,106-108,112-115,117-120,122-143,148-157,228-237,246-256,258-259,261-264,266-274,276,278-280,293-296,301-305,308-309,311,318-323,327-332,345-347,353-356,358,363-367,378,380-381,384-387]
kingspeak-guest up 3-00:00:00
66 idle kp[036,101-105,116,121,260,265,275,277,281-292,306-307,310,312-317,324-326,333,335-344,348-352,357,368-377,379,382-383]
kingspeak-shared-guest up 3-00:00:00 1
drain$ kp145
kingspeak-shared-guest up 3-00:00:00 3
down$ kp[144,146-147]
kingspeak-shared-guest up 3-00:00:00 1
comp kp334
kingspeak-shared-guest up 3-00:00:00 1
mix kp257
kingspeak-shared-guest up 3-00:00:00 193
alloc kp[033-035,037-099,106-108,112-115,117-120,122-143,148-157,228-237,246-256,258-259,261-264,266-274,276,278-280,293-296,301-305,308-309,311,318-323,327-332,345-347,353-356,358,363-367,378,380-381,384-387]
kingspeak-shared-guest up 3-00:00:00 66
idle kp[036,101-105,116,121,260,265,275,277,281-292,306-307,310,312-317,324-326,333,335-344,348-352,357,368-377,379,382-383]
kingspeak-gpu-guest up 3-00:00:00
4 mix kp[359-362]
lin-kp up 14-00:00:0
14 alloc kp[033-035,037-047]
lin-kp up 14-00:00:0
1 idle kp036
lin-shared-kp up 14-00:00:0
14 alloc kp[033-035,037-047]
lin-shared-kp up 14-00:00:0
1 idle kp036
If there are any questions, please contact us via helpdesk@chpc.utah.edu