Alert

Cluster Maintenance: Tues Dec 19, 2017; 8AM-10PM Full Outage

Dear Users,

We will have our next regularly-scheduled quarterly maintenance downtime on Tues December 19 from 8AM to 10PM. This maintenance period includes upgrading network switch firmware and migration of compute nodes from the Frank cluster into the H2P cluster. We will retire Frank at the conclusion of the maintenance. The CRC cluster that will serve the University going forward is H2P:

https://crc.pitt.edu/documentation/h2p/

Please submit a ticket if you need help transition your job submission script from the PBS to SLURM queue environment.

Maintenance Completed and Services Restored

Thank you for your patience. All clusters are running and accepting new jobs. Please submit a ticket if you encounter problems:

http://core.sam.pitt.edu/node/add/support-ticke

H2P cluster back online

Dear Users,

The H2P cluster is now processing queued jobs and accepting new jobs. With help from the vendor, we have resolved the problem with IB Gateway 1. We are working through some remaining residual issues but hope to have the HTC, MPI, and Frank clusters back online soon.

Please submit a ticket if you encounter problems:

http://core.sam.pitt.edu/node/add/support-ticket

Update: Cluster Maintenance Will Continue onto Wednesday

Dear Users,

I am sorry to report that we encountered a significant problem while upgrading the firmware on one of the IB Gateway switches. The vendor was present during the upgrade and is aware of the problem. Because the IB Gateways provide connectivity for our Mobydisk Lustre filesystem and our Infiniband fabric, this issue directly impacts the HTC, MPI, and Frank clusters. While we believe this outage has limited impact on the H2P cluster, we wish to reassess the situation in the morning before attempting to bring part of the compute resources back online.

Extended into the night: CRC Cluster Maintenance

Our apologies ... maintenance is still ongoing. We will post an update when we have brought up a substantial number of compute resources. Sorry for the spam.

Extended until 7PM: CRC Cluster Maintenance

Upgrades to the Infiniband Gateway switches are taking longer than anticipated. We are now targeting 7PM to bring services back online.

CRC Clusters in Maintenance Mode

We aim to have everything back up by 5PM.

Cluster Maintenance: Tues June 6, 2017; Full Outage

Dear Users,

We will have our scheduled quarterly cluster maintenance on Tuesday June 6 from 8AM to 5PM. The earlier post indicated a partial outage for certain clusters but after further review of the network, Infiniband switch, and Omni-Path switch upgrades, we have decided to power down all CRC resources for the maintenance. This full outage will permit us to restore compute services cleanly and more quickly when all upgrades have been completed. Please plan your compute workflow accordingly.

CRC Team.

CRC Symposium: Urban Computing and Machine Learning. Thurs March 2. University Club Ballroom A

Dear Users,

We invite you to our third annual Advancing Research through Computing Symposium that's held at the University Club on Thursday March 2. The themes for the symposium are Urban Computing and Machine Learning.

Students are encouraged to participate in our poster competition for a $500 travel stipend. Any topic that utilizes computing to advance our understanding of our world are eligible.

Full Cluster Maintenance: Monday March 6, 8AM-5PM

Dear Users,

We are scheduling a full cluster outage for maintenance on Monday March 6 between 8AM-5PM. Because this maintenance involves the $HOME storage array, all clusters (Frank, MPI, SMP, HTC) must power down. We will place a system-wide reservation in the queues that will block a job from running should the requested walltime intersects this period. Jobs will remain queued and will start running again once we release the reservation at the completion of maintenance.

Please plan your computing accordingly.

Syndicate content