Solo Production Cluster

Image of Solo

This page contains high-level information about non-restricted HPC platform for use at SNL.

Solo

The Solo cluster adds 13,464 compute cores to Sandia’s High Performance Computing (HPC) capability. At nearly 460 teraFLOPS, Solo roughly doubles our existing OHPC computing environment. Sandia applied Institutional Computing investment funds to purchase the system from Penguin Computing. Solo is based on the Tundra Open Rack Solution generation of hardware found in the highly effective TLCC3 (Tri-Lab Linux Capacity Cluster) procurement.

Solo serves all Program Management Units and Mission Areas as a corporate computational simulation resource.

Solo clusters are institutional computing resources available to all users and projects.

  • Design: Solo was one of SNL’s first DOE NNSA ASC Commodity Technology Systems (CTS-1) procurements.
  • Processor: Each of Solo’s 374 compute nodes contain a dual-socket motherboard and each socket contains a 2.1 GHz Intel Xeon Broadwell (BDW) E5-2695v4 18-core CPU. This provides 36 cores per node.
  • Interconnect: Intel Omni-Path high speed interconnect with Intel OPA hardware that includes node HFI, Edge and Core switches, and Mellanox ConnectX4 HCAs in IO nodes.
  • Operating System: Tri-lab Operating System Software (TOSS3) is a Tri-Lab system software package that provides a common cluster management software stack based on Red Hat Enterprise Linux 7.
  • Vendor: Penguin Computing supplied this system.

Requesting Accounts

If you already have the "OHPC Capacity Cluster" account, you will automatically be added to Solo.

Otherwise, all MoW account requests are submitted via WebCARS.

Users from Tri-Labs, NWC, and ASC University Alliance Partners may submit account requests via SARAPE.

Login & Compile Nodes

Solo cluster has two login nodes: solo-login[1-2], that serve as access points to the cluster, and also as the compile environment. The login nodes have the same hardware configuration as the compute nodes.

Please do not run jobs or applications directly on the login node!

Logging into the Solo system

Use "ssh" to log into:

  • alias "solo" or "solo-login[1-2]"
  • login with your username and Yubikey PIN and OTP output

File Systems

/home should be primarily used for storage of input decks and other small data.

Neither /home nor /projects should be used to write output from job runs for Solo.

We encourage you to use /qscratch and /gpfs to write output from job runs.

The following file systems are on Solo:

  • /home:      100 TB
  • /projects:   100 TB
  • /qscratch:   3.7 PB
  • /gpfs:          4.1 PB

Using Modules

Modules provide a convenient way to set up the environment information that is needed for running codes. Each machine has a large set of available modules, and a default set is loaded for you when you log in. LMOD replaces the TOSS2 modules system. LMOD has more options and it is user friendly.

These default modules set up the basic environment that you need for building and running codes.

Modules also provide an easy way to switch to a non-default environment if needed, and to provide access to various specialized libraries such as fftw, MKL, etc.

There are two main sections in the user environment under TOSS3.
/apps/modules/modulefiles
/opt/modules/modulefiles

The module name and environment variables set by these modules are different.

For example, /opt/modules/modulesfiles sets environment variables FFTW_LIBS and FFTW_INCLUDES instead of FFTW_LIB and FFTW_INCLUDE that we are used to.

The compilers and MPI that are set up as defaults will be the most recent tested versions that we can manage, but there will usually be more recent versions available on the system as well.

We don’t change the default environment very often in order to minimize disruptions to users’ workflow.

Users may want to keep abreast of the latest versions on the system (by simply looking at the output of module av) since these sometimes provide better performance.

Learning More About Environment Modules

More information about Environment Modules is available at the Environment Modules project homepage

Note - Export Controlled Information (ECI) and Unclassified Controlled information (UCI) are NOT allowed on Solo.