CHPC DOWNTIME: General Environment Clusters - May 25&26, 2022
Date Posted: April 21, 2022
CHPC clusters OS update information
Posted: May 28, 2022
We would like to remind everyone that all CHPC clusters were upgraded to a new operation
system, Rocky Linux 8, during the downtime. As a result if this, some programs don't
function as they did before, and users need to use new versions, or new module names
to achieve the previous functionality. These changes are listed at:
Before opening a support ticket, please, go over this document, in particular focusing
on the changes in compilers, MPIs, Python, R and libraries (especially NetCDF).
CHPC - End of Downtime
CHPC - End of Downtime
Posted: May 27, 2022
The downtime has ended.
All clusters and frisco nodes have been released.
All clusters and frisco nodes have been released.
If you have any questions, please let us know
UPDATE #2
Posted: May 26, 2022
The migration of the home directories of the general HPC space was completed. These
home directories are available for access via samba mounts and on the general environment
HPC clusters (once they are back in service).
Lonepeak and the frisco nodes have been returned to service and the reservation on
the batch system on lonepeak has been removed. Work is on the OS update of the remaining
clusters - ash, kingspeak and notchpeak - is continuing.
Reminder: As mentioned in the 5/20/2022 announcement about the new /scratch/general/vast
file system, the /scratch/kingspeak/serial space has been made read-only as a first step in the retirement of this scratch file system. Please copy any content
that is needed elsewhere by the end of June. In early July the /scratch/kingspeak/serial
space will be unmounted.
UPDATE #1
Posted May 25, 2022
The updates on Narwhal and Beehive have been completed and both have been opened for
user access. The use of path lengths longer than the default 260 characters has been
enabled in preparation to be able to more easily be able to mount the linux home and
group spaces.
In addition, the migration of the sys branch, which houses the CHPC installed applications,
to the new VAST file system has been completed. There should be no issues using these
applications on the redwood cluster as well as on any VMs and stand alone servers.
Note that the migration of the home directories in the hpc-home space has not yet
completed. This will be finished tomorrow.
Also note that the clusters in the general environment, including the frisco nodes
and the cryosparc servers, remain offline. Once the work on the power infrastructure
is completed in the morning, the plan is to bring lonepeak, the frisco nodes and the
cryosparc servers back online. After that work will continue on the OS upgrade on
ash, kingspeak and notchpeak.
Reminder and Additional Impact
Posted: May 11th, 2022
CHPC will have a two-day downtime of the general environment HPC clusters on Wednesday & Thursday, May 25 & 26, 2022 starting at 7 am.
A reminder that CHPC will have a two-day downtime that will impact the clusters in
the general environment due to work on the DDC power infrastructure starting at 8
am May 25, 2022. During this outage, CHPC will be moving the OS of the general environment clusters
of ash, kingspeak and notchpeak from CentOS7 to RockyLinux8 as mentioned in the initial
announcement (below).
CHPC staff is adding three additional tasks to this downtime window:
- OS updates for both the Beehive and Narwhal Window Servers. These two servers will be unavailable starting at 8am May 25th. Once the OS updates are complete and the servers have been returned to service, an announcement will be sent to the CHPC mailing list.
- Migration of the sys branch to the new home directory solution. The sys branch includes the CHPC application tree, containing all CHPC installed end user applications. Note that this will impact jobs in the protected environment HPC cluster of redwood, as well as on any stand-alone linux systems and Virtual Machines, that make use of CHPC installed applications. This migration will be started 8am May 25th and a notice will be sent out when the migration has been completed.
- Migration of the home directories in the CHPC general environment HPC space. This includes the home directories for all users who have the default 50 GB home directories, i.e., users in groups who have not purchased home directory space. Migration of the home directories of users in groups that have purchased home directory space will be migrated at a later date. This migration will also be started 8am May 25th with a notice sent out when the migration has been completed.
Posted April 21, 2022
Due to the need for the replacement of breakers in the Downtown Data Center (DDC)
power infrastructure that supports the general environment clusters -- ash, kingspeak, lonepeak, and notchpeak, including
the frisco nodes and the cryosparc nodes -- CHPC will have a two-day outage at the end of May. The outage will only be for
the general environment clusters, including both the compute and the interactive nodes.
Note that the breaker replacement is part of the maintenance of the DDC power infrastructure,
and is the first time that these breakers have been replaced since CHPC has been housed
at the DDC.
As this is a longer downtime than is typical, we are providing advance notice to allow
CHPC users time to plan accordingly.
Reservations are in place to drain the clusters mentioned above of all running jobs
by 7 am on May 25 to allow CHPC staff time to power down the nodes before the work
on the breakers starts at 8am
The work on the breakers requires two windows where there will be no power -- the
first will be the morning of May 25 and the second the morning of May 26.
CHPC will take advantage of this power maintenance to update the OS from CentOS7 to
RockyLinux8 on ash, kingspeak, and notchpeak. In the time between the two power outage
windows, CHPC will start the update process. After the second outage window, we will
first bring the frisco nodes and the lonepeak cluster back to service, followed by
completing the OS update on the remaining clusters before returning them to service.
If you have not already started to test your workflows on the new OS, we recommend
you do so before this downtime.
Please let us know, via helpdesk@chpc.utah.edu, if you have any questions or concerns.