Report problems - trouble@stat
.berkeley .edu Read our docs - https://
computing .stat .berkeley .edu

Monthly tape backups this Sunday
We will be running the monthly backups Sunday, November 2 beginning in the afternoon.
Computers will not be rebooted or unavailable, nor will processes be terminated. You may experience slightly slower input/output performance until the dumps are complete.
New website page.
We have a new documentation page on using AI coding assistants, particularly in connection with the SCF.
https://computing.stat.berkeley.edu/software/ai-coding/
Slurm configuration change.
We have set the default partition on the SCF
cluster to be the high partition with a 28-day time limit.

Monthly tape backups this Sunday
We will be running the monthly backups Sunday, May 4 beginning in the afternoon.
Computers will not be rebooted or unavailable, nor will processes be terminated. You may experience slightly slower input/output performance until the dumps are complete.
Short downtime to increase fileserver bandwidth.
We’ll be doing some maintenance to the network interfaces of our file server starting today at 2p. Our goal is to increase the network bandwidth of our file server. We anticipate that the outage will be very brief (one minute), however there is a slim possibility we may have to reboot the machine. Running jobs should not be terminated in either case.
Lambda cluster unavailable due to unknown problem.
Investigating 2025-04-12¶
The lambda head node is not accessible. I’ve asked the data center folks (in Washington) to investigate.
Resolved 2025-04-16 11:00a¶
The lambda cluster’s head node was having network interface problems. They are clear for the time being.
Updated default python kernel and the remote desktop feature.
I have updated the default python kernel from 3.12 to 3.13 to match the login environment of our recently upgraded systems. The previous kernel is still available as “Python 3.12” in the jupyter lab launcher.
I’ve also updated the remote desktop feature. The latest version supports copy/paste and desktop resizing, has a link back to the hub, and uses a unix socket for security.
We have upgraded Python and R following the OS upgrades.
With the recent operating system upgrade to Ubuntu 24.04, we have also upgraded to Python 3.13 and R 4.4.3. Older versions are available via the module system.
We are upgrading the operating system on most systems.
We will be performing operating system upgrades on all of our compute servers during the upcoming spring break. The upgrades will occur in batches from Monday, March 24, through Thursday, March 27, starting each morning at approximately 8:30am. We will begin with the least active nodes.
Your home and scratch directories will not be affected, however /tmp and /var/tmp on each machine will be wiped. /data partitions, which exist on a small number of systems, will not be impacted.
We will place reservations on the systems to be upgraded about 72 hours in advance. If you currently have or plan to start jobs that you expect to run into next week, please ensure that your code can resume from where it left off if interrupted. For jobs starting before the upgrade, we recommend setting a time limit of no more than 72 hours.
Self-Installed Packages If you have installed your own R packages, you will need to reinstall them after the upgrade or request that we install them system-wide. Check your packages in ~/R/x86_64-pc-linux-gnu-library in versioned subdirectories. Although these directories will remain after the upgrade, R on the new system will look in ~/R/x86_64-pc-linux-gnu-library-ubuntu-24.04/. This is necessary because compiled R packages link against system libraries from the old OS, which may not exist or have different versions on the new OS. To request system-wide installation, please contact us at consult@stat.berkeley.edu.
Python packages installed via conda are not affected. If you have installed Python packages using pip and the setup involves compiling shared objects, these libraries will need to be reinstalled. Compiling shared objects typically means that the package installation involves building components that link to system libraries. If you’re unsure which packages need reinstallation, you can check the package documentation or contact us for assistance.
We are performing a minor update to the SLURM scheduler.
Announcement¶
We are planning to do a minor software update on the SCF SLURM scheduler, which manages batch and JupyterHub jobs on the cluster, on Tuesday, March 4 at 1:00pm.
We anticipate that jobs that are currently running or queued on the cluster will experience no interruption. However, the ability to run or queue new jobs will be impacted for up to 60 minutes.
There is a very small possibility that currently-running jobs may be interrupted. We will be taking every possible precaution to ensure this does not happen, and we will send an announcement immediately if this does occur.
Please email trouble@stat.berkeley.edu if you have any questions or concerns regarding this announcement.
Update¶
The SCF Slurm scheduler software update is now complete. The cluster is back online and accepting jobs. So far we have seen no indication that any of the already running jobs were impacted.
If you notice any new issues with Slurm, please contact trouble@stat.berkeley.edu with a description of the problem.
Standalone server arwen had a failed memory module.
Investigating 2025-02-20 10:13a¶
The login node arwen is having hardware issues and has locked up several times recently. It is offline while we investigate the problem.
Resolved 2025-02-21 4:06p¶
arwen is back online after performing memory stress tests without incident.
Investigating 2025-04-03 10:06a¶
arwen seems to be having RAM issues -- possibly due to a hardware problem. We recommend that you use an alternative for the time being.
Monitoring 2025-04-03 2:06p¶
arwen is back online after removing a memory module.
Electricity went out to many campus buildings, including Evans Hall.
Investigating 2025-01-04 9:27a¶
I just received word that there is a power outage affecting the north side of campus. Unfortunately this has hit our server rooms and all systems are down. I’ll respond with updates when I know more.
Monitoring 2025-01-04 11:06a¶
Power is still out in Evans Hall. Emergency lights are on, and the card readers are not functional.
Resolved 2025-01-04 4:00p¶
Campus issued WarnMe notifications that power was restored Saturday afternoon. Most machines were brought up by 4p or so, although some need local attention so I’ll bring them up Monday.