Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

2025-11-02: Tape Backups
Tape Backups

Monthly tape backups this Sunday

We will be running the monthly backups Sunday, November 2 beginning in the afternoon.

Computers will not be rebooted or unavailable, nor will processes be terminated. You may experience slightly slower input/output performance until the dumps are complete.

Ryan Lovett

2025-09-19: Documentation on AI coding assistants
Documentation on AI coding assistants

New website page.

We have a new documentation page on using AI coding assistants, particularly in connection with the SCF.

https://computing.stat.berkeley.edu/software/ai-coding/

Chris Paciorek

2025-09-18: Default cluster partition has changed.
Default cluster partition has changed.

Slurm configuration change.

We have set the default partition on the SCF cluster to be the high partition with a 28-day time limit.

Chris Paciorek

2025-04-29: Tape Backups
Tape Backups

Monthly tape backups this Sunday

We will be running the monthly backups Sunday, May 4 beginning in the afternoon.

Computers will not be rebooted or unavailable, nor will processes be terminated. You may experience slightly slower input/output performance until the dumps are complete.

Ryan Lovett

2025-04-25: Brief network maintenance
Brief network maintenance

Short downtime to increase fileserver bandwidth.

We’ll be doing some maintenance to the network interfaces of our file server starting today at 2p. Our goal is to increase the network bandwidth of our file server. We anticipate that the outage will be very brief (one minute), however there is a slim possibility we may have to reboot the machine. Running jobs should not be terminated in either case.

Ryan Lovett

2025-04-12: Lambda head node unreachable
Lambda head node unreachable

Lambda cluster unavailable due to unknown problem.

Investigating 2025-04-12

The lambda head node is not accessible. I’ve asked the data center folks (in Washington) to investigate.

Resolved 2025-04-16 11:00a

The lambda cluster’s head node was having network interface problems. They are clear for the time being.

Ryan Lovett

2025-04-03: JupyterHub updates
JupyterHub updates

Updated default python kernel and the remote desktop feature.

I have updated the default python kernel from 3.12 to 3.13 to match the login environment of our recently upgraded systems. The previous kernel is still available as “Python 3.12” in the jupyter lab launcher.

I’ve also updated the remote desktop feature. The latest version supports copy/paste and desktop resizing, has a link back to the hub, and uses a unix socket for security.

Ryan Lovett

2025-04-01: Python and R Upgrades
Python and R Upgrades

We have upgraded Python and R following the OS upgrades.

With the recent operating system upgrade to Ubuntu 24.04, we have also upgraded to Python 3.13 and R 4.4.3. Older versions are available via the module system.

Chris Paciorek

2025-03-21: Downtime notice: Cluster node OS upgrade
Downtime notice: Cluster node OS upgrade

We are upgrading the operating system on most systems.

We will be performing operating system upgrades on all of our compute servers during the upcoming spring break. The upgrades will occur in batches from Monday, March 24, through Thursday, March 27, starting each morning at approximately 8:30am. We will begin with the least active nodes.

Your home and scratch directories will not be affected, however /tmp and /var/tmp on each machine will be wiped. /data partitions, which exist on a small number of systems, will not be impacted.

We will place reservations on the systems to be upgraded about 72 hours in advance. If you currently have or plan to start jobs that you expect to run into next week, please ensure that your code can resume from where it left off if interrupted. For jobs starting before the upgrade, we recommend setting a time limit of no more than 72 hours.

Self-Installed Packages If you have installed your own R packages, you will need to reinstall them after the upgrade or request that we install them system-wide. Check your packages in ~/R/x86_64-pc-linux-gnu-library in versioned subdirectories. Although these directories will remain after the upgrade, R on the new system will look in ~/R/x86_64-pc-linux-gnu-library-ubuntu-24.04/. This is necessary because compiled R packages link against system libraries from the old OS, which may not exist or have different versions on the new OS. To request system-wide installation, please contact us at consult@stat.berkeley.edu.

Python packages installed via conda are not affected. If you have installed Python packages using pip and the setup involves compiling shared objects, these libraries will need to be reinstalled. Compiling shared objects typically means that the package installation involves building components that link to system libraries. If you’re unsure which packages need reinstallation, you can check the package documentation or contact us for assistance.

Dan Ackerman

2025-03-03: Maintenance reminder: Slurm scheduler software update
Maintenance reminder: Slurm scheduler software update

We are performing a minor update to the SLURM scheduler.

Announcement

We are planning to do a minor software update on the SCF SLURM scheduler, which manages batch and JupyterHub jobs on the cluster, on Tuesday, March 4 at 1:00pm.

We anticipate that jobs that are currently running or queued on the cluster will experience no interruption. However, the ability to run or queue new jobs will be impacted for up to 60 minutes.

There is a very small possibility that currently-running jobs may be interrupted. We will be taking every possible precaution to ensure this does not happen, and we will send an announcement immediately if this does occur.

Please email trouble@stat.berkeley.edu if you have any questions or concerns regarding this announcement.

Update

The SCF Slurm scheduler software update is now complete. The cluster is back online and accepting jobs. So far we have seen no indication that any of the already running jobs were impacted.

If you notice any new issues with Slurm, please contact trouble@stat.berkeley.edu with a description of the problem.

Dan Ackerman

2025-02-20: arwen RAM problems
arwen RAM problems

Standalone server arwen had a failed memory module.

Investigating 2025-02-20 10:13a

The login node arwen is having hardware issues and has locked up several times recently. It is offline while we investigate the problem.

Resolved 2025-02-21 4:06p

arwen is back online after performing memory stress tests without incident.

Investigating 2025-04-03 10:06a

arwen seems to be having RAM issues -- possibly due to a hardware problem. We recommend that you use an alternative for the time being.

Monitoring 2025-04-03 2:06p

arwen is back online after removing a memory module.

Ryan Lovett

2025-02-06: Unexpected downtime: Evans Hall networks down
Unexpected downtime: Evans Hall networks down

A brief network outage hit Evans Hall.

Investigating 2025-02-06 11:30p

Several wired networks in Evans Hall are down at the moment. I’ve reported it to campus.

Resolved 2025-02-07 12:28a

The networks started to come back online a few minutes after midnight.

Ryan Lovett

2025-01-04: Unexpected downtime: Campus power outage
Unexpected downtime: Campus power outage

Electricity went out to many campus buildings, including Evans Hall.

Investigating 2025-01-04 9:27a

I just received word that there is a power outage affecting the north side of campus. Unfortunately this has hit our server rooms and all systems are down. I’ll respond with updates when I know more.

Monitoring 2025-01-04 11:06a

Power is still out in Evans Hall. Emergency lights are on, and the card readers are not functional.

Resolved 2025-01-04 4:00p

Campus issued WarnMe notifications that power was restored Saturday afternoon. Most machines were brought up by 4p or so, although some need local attention so I’ll bring them up Monday.

Ryan Lovett