Skip to content
Open

Zhum #25

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 129 additions & 30 deletions Containerization-HPC.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,39 +6,138 @@

The aim of this task is to build an HPC compatible container (i.e. [Singularity](https://sylabs.io/guides/3.5/user-guide/introduction.html)) and test its performance in comparison with a native installation (no containerization) for a set of distributed memory calculations.

# Requirements
# Solution

Here we use AWS virtual parallel cluster (https://docs.aws.amazon.com/parallelcluster/latest/ug/what-is-aws-parallelcluster.html). AWS account is needed. We suppose to use Linux and console tools. aws console tools are supposed to be installed.

**Note**: aws account should have appropriate permissions (TO BE ADDED).

## Steps

1. Install pcluster tool (here v2 is used): `pip3 install "aws-parallelcluster<3.0" --upgrade --user`
2. Run initial conigure: `aws configure; pcluster configure`. The scheduler preferred to be slurm
3. Create a cluster: `pcluster create my-hpc-cluster`
4. Go to the cluster `pcluster ssh my-hpc-cluster`
5. Install the singularity from github releases (https://github.com/sylabs/singularity/releases/): `wget https://github.com/sylabs/singularity/releases/download/v3.9.8/singularity-ce_3.9.8-focal_amd64.deb; sudo apt install singularity-ce_3.9.8-focal_amd64.deb`
6. Copy and edit reciepe file (look the [hpl1.def](hpl1.def))
7. Build a singularity image: `sudo singularity build my-image.sif my-reciepe.def` (note **sudo** usage).
8. Run the application like this: `sbatch -n NUM_CPUS ./sing-run.sh --bind INPUT:/INPUT my-image.sif /path/to/app/in/image`. Here `--bind ...` is optional, it passes the input file into the container, can be specified multiple times and can pass directories.

`sing-run.sh` can be modified for custom needs.

1. A working deployment pipeline - using any preferred tool such as SaltStack, Terraform, CloudFormation - for building out the computational infrastructure
2. A pipeline for building the HPC compatible container
3. A set of benchmarks for one or more HPC application on one or more cloud instance type
**Important note**: do not use standard openmpi, you should use amazon toolkit and amazon openmpi. See the example reciepe file (hpl1.def) to details.

# Expectations
## Options

- The application may be relatively simple - e.g. Linpack, this is focused more on infrastructure
- Repeatable approach (no manual setup "in console")
- Clean workflow logic
You can prepare your singularity image on another aws instance or computer, but you should optimize it for aws instance processor (AVX512 preferred).

# Timeline
You can set up the cluster without slurm and run the app via mpirun directly.

We leave exact timing to the candidate. Should fit Within 5 days total.
You can set up the cluster without slurm via terraform and salt/ansible, but details should be specified in later investigations.

## Example for HPL

Here is commented hpl1.def file, use it for new images as a sample.

```
Bootstrap: docker
# use amazon linux, it is based on centos. If use ubuntu or alpine, change packages management.
From: amazonlinux

# files to be copied into container
%files
hpl-2.3.tar.gz
Make.t1

# default environment, add amazon openmpi path
%environment
export LC_ALL=C
export PATH=$PATH:/opt/amazon/openmpi/bin

# prepare our image
%post
export PATH=$PATH:/opt/amazon/openmpi/bin

# install gcc etc
yum groupinstall 'Development Tools' -y

# install additional tools and atlas lib (for HPL)
yum install -y make tar curl sudo altas atlas-devel

# Here is ubuntu option
# apt install -y altas make tar openmpi-aws

# get and install amazon efa tools. Important for MPI applications
curl -O https://efa-installer.amazonaws.com/aws-efa-installer-1.15.1.tar.gz
tar -xf aws-efa-installer-1.15.1.tar.gz && pushd aws-efa-installer
sudo ./efa_installer.sh -y
popd

#
# Here is HPL-oriented part. Ypu can use another commands to build and install your app
# build HPL in the /hpl directory, then we'll delete it
mkdir /hpl
tar xfz hpl-2.3.tar.gz -C /hpl
rm hpl-2.3.tar.gz
cp Make.t1 /hpl/hpl-2.3
cd /hpl/hpl-2.3

# use cutom prepared makefile
make arch=t1

# copy compiled xhpl app
mv bin/t1/xhpl /bin
cd ..

# clean
rm -rf /hpl
```

## Testing results

See [singularity-hpl.out](singularity-hpl.out) and [raw-hpl.out](raw-hpl.out) for singularity run and raw hpl run respectively. The short data is here:

```
RAW HPL:

WR10L2L2 5000 512 9 8 1.73 4.8320e+01
WR10L2L2 20000 512 9 8 22.08 2.4160e+02
WR10L2L2 12128 512 9 8 7.07 1.6818e+02

Singularity HPL:

WR10L2L2 5000 512 9 8 1.79 4.6495e+01
WR10L2L2 20000 512 9 8 22.52 2.3682e+02
WR10L2L2 12128 512 9 8 7.09 1.6776e+02
```

## Terraform approach notes

Here is my attempts to set up the cluster. TODO: Salt config should be added.

Yes, after installing efa toolkit terrafom-based cluster works too. In the [terraform.tgz](terraform.tgz) file are terraform configs used. I've got them from https://github.com/bugbiteme/demo-tform-aws-vpc and slightly modified for this task.

Tasks below should be executed via saltstack (or ansible, or...), but I did them manually. I need some time to remember how to use salt :)

After terraform installations we need to get all our nodes ip addresses from internal network (I don't know how to do this automatically yet), then create the slurm configuration file (here I attach the final [slurm.conf](slurm.conf)). Then we need to install slurm packages on head and compute nodes, copy `slurm.conf` into `/etc/slurm-llnl/` (on all nodes), enable and start slurmctld service on head node, and slurmd service on compute nodes.

Then we need to share `/home` on head node via nfs (put `/home 10.0.0.0/8(rw,no_root_squash)` into `/etc/exportfs` and exec `exportfs -r`). After that mount it on the nodes (put `head-node-ip:/home /home nfs rw,defaults,_netdev 0 0` into `/etc/fstab`, then exec `mount /home`).

Install efa tools on all nodes if we need to run native (not conteinerised) apps.

Ok! Our cluster is ready.

## Attached files

- [hpl1.def](hpl1.def) - sigularity reciepe to build an image
- [HPL.dat](HPL.dat) - Linpack input sample data
- [Make.t1](Make.t1) - makefile for Linpack (note path to atlas, in case of ubuntu it will be different)
- [raw-hpl.out](raw-hpl.out) - output of native HPL run
- [singularity-hpl.out](singularity-hpl.out) - output of singularity HPL run
- [results.txt](results.txt) - short results for comparation (native and singularity HPL)
- [xhpl.sh](xhpl.sh) - batch script for SLURM to run singilarity xhpl
- [zhpl.sh](zhpl.sh) - batch script for SLURM to run native xhpl
- [sing-run.sh](sing-run.sh) - batch script for SLURM to run any singularity container (just add singularity options, needed after `exec`).
- [terraform.tgz](terraform.tgz) - terraform configs
- [slurm.conf](slurm.conf) - sample slurm config file. IP addresses should be replaced by appropriate ones

# User story

As a user of this pipeline I can:

- build an HPC-compatible container for an HPC executable/code
- run test calculations to assert working state of this container
- (optional) compare the behavior of this container with a OS native installation

# Notes

- Commit early and often

# Suggestions

We suggest:

- using AWS as the cloud provider
- using Exabench as the source of benchmarks: https://github.com/Exabyte-io/exabyte-benchmarks-suite
- using CentOS or similar as operating system
- using Salstack, or Terraform, for infrastructure management
32 changes: 32 additions & 0 deletions HPL.dat
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
1 # of problems sizes (N)
1000 16129 16128 Ns
1 # of NBs
512 384 640 768 896 960 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
1 2 2 Ps
2 1 2 Qs
16.0 threshold
1 # of panel fact
0 1 2 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
2 8 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
0 1 2 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
0 2 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 0 DEPTHs (>=0)
1 SWAP (0=bin-exch,1=long,2=mix)
192 swapping threshold
1 L1 in (0=transposed,1=no-transposed) form
1 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)

186 changes: 186 additions & 0 deletions Make.t1
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
#
# -- High Performance Computing Linpack Benchmark (HPL)
# HPL - 2.3 - December 2, 2018
# Antoine P. Petitet
# University of Tennessee, Knoxville
# Innovative Computing Laboratory
# (C) Copyright 2000-2008 All Rights Reserved
#
# -- Copyright notice and Licensing terms:
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions, and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# 3. All advertising materials mentioning features or use of this
# software must display the following acknowledgement:
# This product includes software developed at the University of
# Tennessee, Knoxville, Innovative Computing Laboratory.
#
# 4. The name of the University, the name of the Laboratory, or the
# names of its contributors may not be used to endorse or promote
# products derived from this software without specific written
# permission.
#
# -- Disclaimer:
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY
# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# ######################################################################
#
# ----------------------------------------------------------------------
# - shell --------------------------------------------------------------
# ----------------------------------------------------------------------
#
SHELL = /bin/sh
#
CD = cd
CP = cp
LN_S = ln -s
MKDIR = mkdir
RM = /bin/rm -f
TOUCH = touch
#
# ----------------------------------------------------------------------
# - Platform identifier ------------------------------------------------
# ----------------------------------------------------------------------
#
ARCH = t1
#
# ----------------------------------------------------------------------
# - HPL Directory Structure / HPL library ------------------------------
# ----------------------------------------------------------------------
#
TOPdir = /hpl/hpl-2.3
INCdir = $(TOPdir)/include
BINdir = $(TOPdir)/bin/$(ARCH)
LIBdir = $(TOPdir)/lib/$(ARCH)
#
HPLlib = $(LIBdir)/libhpl.a
#
# ----------------------------------------------------------------------
# - Message Passing library (MPI) --------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the C compiler where to find the Message Passing library
# header files, MPlib is defined to be the name of the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
MPdir =
# /usr/local/mpi
MPinc =
# -I$(MPdir)/include
MPlib =
# $(MPdir)/lib/libmpich.a
#
# ----------------------------------------------------------------------
# - Linear Algebra library (BLAS or VSIPL) -----------------------------
# ----------------------------------------------------------------------
# LAinc tells the C compiler where to find the Linear Algebra library
# header files, LAlib is defined to be the name of the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#
LAdir = /usr/lib64/atlas/
LAinc =
LAlib = $(LAdir)/libsatlas.so.3
#
# ----------------------------------------------------------------------
# - F77 / C interface --------------------------------------------------
# ----------------------------------------------------------------------
# You can skip this section if and only if you are not planning to use
# a BLAS library featuring a Fortran 77 interface. Otherwise, it is
# necessary to fill out the F2CDEFS variable with the appropriate
# options. **One and only one** option should be chosen in **each** of
# the 3 following categories:
#
# 1) name space (How C calls a Fortran 77 routine)
#
# -DAdd_ : all lower case and a suffixed underscore (Suns,
# Intel, ...), [default]
# -DNoChange : all lower case (IBM RS6000),
# -DUpCase : all upper case (Cray),
# -DAdd__ : the FORTRAN compiler in use is f2c.
#
# 2) C and Fortran 77 integer mapping
#
# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default]
# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long,
# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
#
# 3) Fortran 77 string handling
#
# -DStringSunStyle : The string address is passed at the string loca-
# tion on the stack, and the string length is then
# passed as an F77_INTEGER after all explicit
# stack arguments, [default]
# -DStringStructPtr : The address of a structure is passed by a
# Fortran 77 string, and the structure is of the
# form: struct {char *cp; F77_INTEGER len;},
# -DStringStructVal : A structure is passed by value for each Fortran
# 77 string, and the structure is of the form:
# struct {char *cp; F77_INTEGER len;},
# -DStringCrayStyle : Special option for Cray machines, which uses
# Cray fcd (fortran character descriptor) for
# interoperation.
#
F2CDEFS =
#
# ----------------------------------------------------------------------
# - HPL includes / libraries / specifics -------------------------------
# ----------------------------------------------------------------------
#
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib)
#
# - Compile time options -----------------------------------------------
#
# -DHPL_COPY_L force the copy of the panel L before bcast;
# -DHPL_CALL_CBLAS call the cblas interface;
# -DHPL_CALL_VSIPL call the vsip library;
# -DHPL_DETAILED_TIMING enable detailed timers;
#
# By default HPL will:
# *) not copy L before broadcast,
# *) call the BLAS Fortran 77 interface,
# *) not display detailed timing information.
#
HPL_OPTS = -DHPL_CALL_CBLAS
#
# ----------------------------------------------------------------------
#
HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
#
# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
CC = mpicc
CCNOOPT = $(HPL_DEFS)
CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops
#
# On some platforms, it is necessary to use the Fortran linker to find
# the Fortran internals used in the BLAS library.
#
LINKER = mpicc
LINKFLAGS = $(CCFLAGS)
#
ARCHIVER = ar
ARFLAGS = r
RANLIB = echo
#
# ----------------------------------------------------------------------
Loading