Skip to content

Redwood Cluster Operating System Updated to Rocky Linux 8.10

Date published: August 21, 2024

Redwood Cluster Operating System Update

On Tuesday August 20, 2024, the operating system of the redwood cluster nodes was updated from CentOS version 7 to Rocky Linux version 8. This document describes the changes users will experience due to this upgrade.

The ssh host keys for the redwood interactive nodes have changed. Therefore, before connecting to the redwood cluster interactive nodes, you must remove entries for redwood1, redwood2, or any other interactive nodes in your $HOME/.ssh/known_hosts file, where the old host keys are cached. We understand this is an inconvenience, but it was not possible to retain the old ssh host keys under the new operating system.

Key Changes on Redwood as part of the CentOS7 → RockyLinux8 Upgrade:

Note: We have tested the following changes and believe that they will meet the needs of our user base on redwood. If issues surface that we can not find an acceptable and quick solution to we have the ability to temporally disable some of the key changes until we are able to properly identify a long term solution.

  • Security-enhanced Linux (SELinux) is in targeted enforcing mode.
    • Leveraging NFS v4.2 security labels to have core cluster services confined.
  • Login/Interactive nodes will have new ssh host signatures that will require users to clear from the remote systems “known_hosts” files often found as ~/.ssh/known_hosts.
  • Idle login session management (Lock/Disconnect without leaving content on screen):
    • FastX - Idle Timeout Disconnect, Auth Token Expiration. (15 minutes for both).
    • SSH Shell - “Transparent Screen” session leveraged to lock on timeout. (15 minutes).
    • XForwarding disabled on the interactive nodes so that screen lock is effective.
    • Users should use FastX or OnDemand for graphical application in place of XForwarding.
    • In both cases sessions will not be impacted.
  • More effective isolation of temporary system “spaces” to avoid data exposure via ‘pam_namespaces’:
    • Note: Instanced versions of directories noted below are on each hosts local disk instead of ‘tmpfs’.
    • Previously ‘tmpfs’ usage was very contentious with available RAM for application usage creating increased Out Of Memory (OOM) events.
  • Instanced per user versions of /tmp, /var/tmp/, and /scratch/local on interactive nodes.
    • /scratch/local/$USER/$HOST is created during login for users apart from pam_namespaces.
  • Instanced per user versions of /tmp and /var/tmp on compute nodes.
    • /scratch/local still managed by slurm for /scratch/local/$user/$slurm_jobid as currently done.
  • Local storage on interactive and compute node resources (including those owned by specific groups):
    • All systems use a dynamic system to provision local storage on each system boot.
    • This has already been the normal mode on Redwood compute nodes for the past few years.
    • Interactive nodes are now transitioning to this model too as part of this RockyLinux8 upgrade.
    • Local storage is for temporary usage of content that is ok to lose on a node reboot, disk failure, etc.
    • There is no backup of /scratch/local on any system in the cluster.
    • Compute nodes have slurm prolog/epilog scripts that manage /scratch/local/$slurm_user/$slurm_jobid at the start and end of jobs to clean up local storage between jobs.
  • FastX Session Options:
    • XFCE as main “full desktop”
    • XFCE Terminal & xterm as main “terminal” options.
  • OS RockyLinux8.10:
    • Lustre Client: 2.15.4
    • NVidia Driver: 555.42.02
    • Cuda: 12.5 (other versions available via lmod)
  • Tanium installed on interactive nodes and NFSRoot servers (should be transparent to users)
  • Significant rework of our HIPAA compliance documentation underway.
    • Documentation styled to align with CMMC documentation approaches.
    • Management access, processes and procedures adjusted to a zone based model to increase security posture.
    • Will engage with campus to review changes and overall posture of the new deployment.
    • Access to ‘dmesg’ has been restricted per the OSCAP/STIGG controls.
  • User crontabs on the interactive/Login nodes will not be carried forward.
    • Users are instead encouraged to create new cron entries per their needs.
    • If users need to see a copy of what they had we can access the previous setup for a time to be able to show them what they had.

If you have any difficulties with the redwood cluster or questions about the new operating system please contact us at helpdesk@chpc.utah.edu.

 

Last Updated: 12/17/24