Python framework for evaluating Foundation Models (FM).
This framework consists of a container that hosts the dependencies required for an extendable collection of Models. Snapshots of model source code are also captured along with supporting inference scripts. Configuration files that specify runtime parameters such as data paths and model tunings are also included. Example runs illustrate how to invoke scripts that execute model inferences.
NOTE: The initial version of this project is deployed with restrictions:
- The container can be deployed on any platform with Singularity or Docker; however, associated model checkpoints and statistics file are not included.
- In order to run the canned Python/Bash scripts, the user must log into the Discover cluster and execute the runtime scripts described below.
- All paths reflect a static Discover enviroment, referencing both fully-specified and relative paths to the input data.
- To change default parameters, a copy of the runtime scripts should be made by the user and modified accordingly.
- Scripts and configuration files, which are hard-coded with parameters that invoke a very specific Discover invocation, have typically originated in the separate Model projects and tweaked to run in this environment.
- Each FM is entirely independent and has a unique runtime signature.
- Output formats vary across FMs.
- Runtime assistance can be nominally supported by the development team, but FM model architecture expertise is not provided.
- Library to process FMs using GPU and CPU parallelization.
- Machine Learning and Deep Learning inference applications.
- Example scripts for a quick AI/ML start with your own data.
- Glenn Tamkin: glenn.s.tamkin@nasa.gov
- Jian Li: jian.li@nasa.gov
- Jordan Alexis Caraballo-Vega: jordan.a.caraballo-vega@nasa.gov
This User Guide reflects instructions for running inference scripts on Discover only.
Allocate a GPU before running the inference scripts:
salloc --gres=gpu:1 --mem=60G --time=1:00:00 --partition=gpu_a100 --constraint=rome --ntasks-per-node=1 --cpus-per-task=10To run each of the Foundation Model tasks with qefm-core, change directories to the qefm-core root directory and run the inference script:
module load singularity
cd <Root directory>
./tests/fm-inference.sh <Container name> <Foundation Model name>To run a specific Foundation Model task with qefm-core, use the following command:
./tests/fm-inference.sh <Container name> <Foundation model name> | Command-line-argument | Description | Required/Optional/Flag | Default | Example |
|---|---|---|---|---|
<Root directory> |
Path fo qefm-core installation | Required | N/A | /discover/nobackup/projects/QEFM/qefm-core |
<Container name> |
Name of Singularity container image (or sandbox) | Required | N/A | qefm-core-all.sif |
<Foundation Model name> |
Short title of Foundation Model | Required | N/A | gencast,aifs,aurora, fourcastnet, graphcast, pangu, privthi, sfno |
Navigate to Root directory on Discover:
cd /discover/nobackup/projects/QEFM/qefm-coreRun Inference for GenCast Foundation Model:
./tests/fm-inference.sh qefm-core-all.sif gencastRun Inference for AIFS Foundation Model:
./tests/fm-inference.sh qefm-core-all.sif aifsRun Inference for Aurora Foundation Model:
./tests/fm-inference.sh qefm-core-all.sif auroraRun Inference for Fourcastnet Foundation Model:
./tests/fm-inference.sh qefm-core-all.sif fourcastnetRun Inference for Pangu Foundation Model:
./tests/fm-inference.sh qefm-core-all.sif panguRun Inference for Privthi Foundation Model:
./tests/fm-inference.sh qefm-core-all.sif privthiRun Inference for GraphCast Foundation Model:
./tests/fm-inference.sh qefm-core-all.sif graphcastRun Inference for SFNO Supported Foundation Model:
./tests/fm-inference.sh qefm-core-all.sif sfnoRun Inference for All Foundation Models:
./tests/fm-inference.sh qefm-core-all.sif ensembleSince Singularity caches the container when invoked, it is important to specify the location of this cache to avoid disk space limitations. If running on Discover, /lscratch is a handy spot to create a directory path to use as a cache. See the example below for setting the appropriate environment variables:
export APPTAINER_TMPDIR=/lscratch/tdirs/gt-scratch/.cache
export APPTAINER_CACHEDIR=/lscratch/tdirs/gt-scratch/.cache