Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Hardware

The SCF operates one GPU available to all SCF users on an equal basis and many other GPUs purchased by research groups that are available to group members at regular priority and to other SCF users at lower (preemptible) priority.

You need to use the Slurm scheduling software to run any job making use of the GPU. You may want to use an interactive session to develop and test your GPU code. That same link also has information on monitoring GPU usage of your job.

General Access GPUs

GPU partition

We have one Titan Xp with 12 GB memory on one of our Linux servers (roo), available through the gpu partition.

Savio

Access to GPUs on Savio is available through the Savio faculty computing allowance. Please contact SCF staff for more information.

Research Group GPUs

The SCF also operates the following research group GPUs. These GPUs are owned by individual faculty members, but anyone can run jobs on them. If you are not a member of the lab group, your jobs will run on a preemptible basis, which means they can be cancelled at any time by a higher-priority jobs. These servers can be accessed by submitting to specific partition of interest using the Slurm scheduling software.

See the first table below for information about the GPU servers and the second table for more detailed information related to local disk, GPU-to-GPU interconnect, and location/latency.

With regard to the latency, note that some GPU machines are located at the NASA Ames facility approximately 75 km from Berkeley, where the SCF fileservers hosting home and scratch directories (via NFS) are located. This distance results in a two microsecond latency that when working with many (often small) files (including Conda/Mamba-related work) can sometimes cause slowness and laggy behavior. Local disks are available to group members to help work around this problem.

GPU server specs

PartitionMachine NameGPU Type (Number of GPUs)GPU Memory
jsteinhardtcubbins[1]H200 (8)144 GB
jsteinhardtmcfuzz[1]H200 (8)144 GB
jsteinhardtmooney[1]H200 (8)144 GB
jsteinhardtsneetches[1]H200 (8)144 GB
jsteinhardtbalrogA100 (8)40 GB
jsteinhardtsarumanA100 (10)80 GB
jsteinhardtrainbowquartzA5000 (8)24 GB
jsteinhardtsmokyquartzA4000 (8)16 GB
jsteinhardtsunstoneA4000 (8)16 GB
jsteinhardtsmaugQuadro RTX 8000 (1)48 GB
jsteinhardtshadowfaxGeForce RTX 2080 Ti (8)11 GB
yugrouptreebeardA100 (1)40 GB
yugroupmerryGeForce GTX TITAN X (1)12 GB
yugroupmorgothTitan Xp (1)12 GB
yugroupmorgothTitan X (Pascal) (1)12 GB
yssluthienA100 (4)80 GB
yssberenA100 (8)80 GB
songmeifeanor[1]H200 (8)144 GB
berkeleynlphorton[1]H200 (8)144 GB
berkeleynlplorax[1]H200 (8)144 GB

GPU server additional specs

PartitionMachine NameGPU-to-GPU InterconnectLocal storage[2]Location
jsteinhardtcubbins[1]NVSwitch14 TB NVMENASA Ames[3]
jsteinhardtmcfuzz[1]NVSwitch14 TB NVMENASA Ames[3]
jsteinhardtmooney[1]NVSwitch14 TB NVMENASA Ames[3]
jsteinhardtsneetches[1]NVSwitch14 TB NVMENASA Ames[3]
jsteinhardtbalrogNVLink (pairs)3.5 TB spinningBerkeley
jsteinhardtsarumanNVLink (pairs)7 TB NVMEBerkeley
jsteinhardtrainbowquartzA5000 (8)3.5 TB NVMEBerkeley
jsteinhardtsmokyquartzA4000 (8)3.5 TB NVMEBerkeley
jsteinhardtsunstoneA4000 (8)3.5 TB NVMEBerkeley
jsteinhardtsmaugNVLink (pairs)2 TB NVMEBerkeley
jsteinhardtshadowfaxNone3.6 TB spinningBerkeley
yugrouptreebeardN/ABerkeley
yugroupmerryN/ABerkeley
yugroupmorgothN/ABerkeley
yugroupmorgothN/ABerkeley
yssluthienNoneBerkeley
yssberenNVLink (pairs)Berkeley
songmeifeanor[1]NVSwitch6.6 TB NVMENASA Ames[3]
berkeleynlphorton[1]NVSwitch56 TB NVMENASA Ames[3]
berkeleynlplorax[1]NVLink (pairs)56 TB NVMENASA Ames[3]
Footnotes
  1. Requires the fully qualified domain name when connecting, i.e., ssh {hostname}.stat.berkeley.edu.

  2. Storage available at /data to group members. In addition, all machines have 100s of GB available on local /tmp and /var/tmp on spinning disks, available to all users.

  3. Note that some GPU machines are located at the NASA Ames facility approximately 75 km from Berkeley, where the SCF fileservers hosting home and scratch directories (via NFS) are located. This distance causes a two microsecond latency that when working with many (often small) files (including Conda/Mamba-related work) can sometimes cause slowness and laggy behavior. Local disks are available to group members to help work around this problem.