Processor counts, load balancing and memory management on Cheyenne and Geyser

General rules for determining PE counts, memory requirements and distribution across nodes

IMPORTANT NOTE 1/1/2018: geyser intra-node communication works correctly. CISL is still working on the Open MPI build to support inter-node communication. For now, only one node on geyser is allowed (-N 1).

All CESM postprocessing tools can be run either on cheyenne or on geyser directly or via cheyenne. create_postprocess creates a set of submission scripts for cheyenne (PBS) and for geyser (SLURM) in the $PP_CASE_PATH. Default settings with best guess processor and node counts and memory requirements for SLURM are defined in each submission script. These default settings are optimized for CMIP6 postprocessing and cylc workflow interaction.

There may be instances when these default settings are not optimal for the postprocessing tasks. For example, ocean hi-resolution data, long sea ice time diagnostics, or atmospheric data sets with lots of variables may require modifications to the batch submission stanza default settings. Listed below are some specific guidelines for modifying the batch submission settings.

Averages:

For ocean hi-resolution, long ice timeseries or atm data sets with a lot of variables, set the netcdf_format XML variable to netcdfLarge. For [compname]_averages_geyser the SBATCH queue settings -n, -N, -ntasks-per-node, --mem and -t may need adjustments in order to maximize the amount of memory available per mpi task. Geyser shared nodes allow for up to 16 mpi tasks per node and 1000 Gbytes of memory shared across the tasks.

The parallelism utilized in the PyAverager is a rationing scheme based on number of variables to be averaged and number of different averages to be computed. See PyAverager README for details regarding what averages are computed for each CESM component. Consequently, if the variables are large or there are a lot of years to averaged, then the optimal performance layout would be to try and use as many shared geyser nodes as possible (-N) while reducing the number of mpi tasks per node (-n and --ntasks-per-node) while increasing the amount of memory (--mem) available per task.

There is a trade-off between requesting more resources in a shared environment but also getting through the queue in a timely manner.

Land regriding

NCL based Diagnostics:

For diagnostics, the SBATCH -n option should not to exceed the number of plot sets to be created. The -N, --ntasks-per-node, -t and --mem may need to be adusted Depending on the size of the climatology files generated by the averager.

ILAMB and IOMB diagnostics:

Variable Time series generation:

On geyser, -n should be set to total number of history streams to be converted into variable timeseries with 16 minimum tasks per stream. The --mem may need to be adjusted depending on the number of years included in a single variable timeseries "chunk" and the size of the variable. By default, the output single variable timeseries files are netcdf4c.