Slurm Priority Scoring

For each job submitted to a CHPC cluster via Slurm, Slurm assigns a priority score. The priority score determines Job Priority, which is an interger value that determines the job's position in the Slurm queue relative to other jobs. This position in the Slurm queue will set approximately how long the job will take to start running.

At the CHPC, there are multiple factors that determine a job's priority score. In this article, we will discuss the factors that determine a job's priority score:

Factors that Set Slurm Job Priority at the CHPC
- QOS
- Age
- Fairshare
- Jobsize
Methods to Limit Your Wait Time in the Slurm Queue

Factors that Set Slurm Job Priority at the CHPC

The CHPC's install of Slurm takes into account four factors when determining a priority score for a submitted job. These factors include the age of the job, our fairshare policy, the job size, and a user's credentials, or QOS. These factors only apply to CHPC-owned resources and not to owner resources on the clusters. The CHPC has also set a limit on the number of jobs submitted per user that can accrue priority based on the age factor. While a user can submit up to 1000 jobs per QOS, only the first 5 submitted jobs will accrue priority based on the age of the job.

The four factors that determine a Slurm job's priority score at the CHPC are as follows:

(1) QOS -- The first and most significant portion of a jobs' priority score is based on the QOS used. Each account is associated with at least one QOS per partition. By default, the vast majority of QOS's give each job a base priority of 100,000 points. There are a few exceptions to this, such as the freecycle partitions, which get a base priority score of 10,000 points.

(2) Age -- The age parameter is the amount of time a job spends in the queue. A job will accrue a small number of points as it waits in the Slurm queue. For the "age" of a job we will see a somewhat linear growth of the priority until it hits a cap of two weeks time accrual.

(3) Fairshare -- Fairshare is a factor based on the usage of a user in the previous two weeks. A user that has used the system less in the past two weeks receives a bonus priority (1 or 2 priority points) over the user that has used the system more recently. This policy, along with our policy of limiting only 5 jobs per QOS, helps ensure that all users can use the CHPC environment in a fair way.

(4) Job Size -- Job size is a fixed value added to the job's priority score at submit time according to the number of resources requested and is determined by how many nodes/cores a job has requested. As larger jobs tend to stay in the queue longer, due to the large number of resources requested, large jobs will sometimes be given a slightly higher priority to prevent them from sitting in the queue too long.

If you are interested in understanding the factors that led to your job priority score, you can run the 'sprio' command and see the current priority as well as the source of the priority (in terms of the three components mentioned above) for all idle jobs in the queue on a cluster.

Methods to Limit Your Wait Time in the Slurm Queue

The most impactful change that users can make to limit their time in the Slurm queue is to make sure that the resources requested in their #SBATCH Slurm directives align with the requirements of their job. Many users who submit their jobs to Slurm tend to overask for resources, asking for more time, CPUs, and memory than their jobs require. By overasking for resources, many users spend more time in the Slurm queue than they likely would have if they had asked for resources that more closely aligned with their job's requirements.

There are a number of ways that you can both increase your job efficiency and limit the time your job spends in the Slurm queue:

Check your Job Efficiency

There are a number of ways a user could check their job efficiency. To get an overall look at your job(s), use the SUPReMM module of XDMoD. SUPReMM uses system measurements and shows higher accuracy than reports from Slurm. XDMoD can be accessed either through the OnDemand home pages (General Environment and Protected Environment) or by the CHPC's web interface of XDMoD. To filter by job and userID via XDMoD, follow these steps:

- Log in to xdmod, then select the "Job Viewer" tab (rightmost).
- Select Search, and in that window, Advanced Search
- Set Start and End date for the search
- Realm - Jobs
- Filter - User = User name (actual name - Last, First).
- Then click Add, and then Search

Please be aware that XDMoD assumes hyperthreading, so it reports as if there were 2x as many CPUs as there are and as such a 100% CPU usage will only show as 50% in the XDMoD reporting.

Other methods to show job efficiency are with the commands top, atop, and sar. While a user is running a job, the CHPC allows the user to ssh to the computational node where there job is actively running. Once ssh'd in, users can run the aforementioned commands to view CPU load by process, which is a good first estimate of if the program runs efficiently.

On the node level, we tend to look at "sar", which gives average CPU load in 10 minute intervals. It's a good estimate on how overall is the node utilized. Other flags to sar show memory usage, network, etc.

Use Personalized Slurm Queries

Use the personalized slurm query mysinfo to view what nodes you have access to and what their allocated state is.

Use other Slurm Partitions

Many users request Slurm partitions that, by default, give the user access to the whole node's resources when the user's jobs only require a small portion of the node's resources. A more efficient partition to select would be the shared partitions. In the case the job needs 8 hours or less to run, the user can request the notchpeak-shared-short partition.

Slurm Priority Scoring

Factors that Set Slurm Job Priority at the CHPC

Methods to Limit Your Wait Time in the Slurm Queue

Other CHPC Documentation on Slurm