Skip to content

nasa-nccs-hpda/qefm-core

Repository files navigation

Quantitative Evaluation of Foundation Models

Python framework for evaluating Foundation Models (FM).

Documentation

qefm-core

This framework consists of a container that hosts the dependencies required for an extendable collection of Models. Snapshots of model source code are also captured along with supporting inference scripts. Configuration files that specify runtime parameters such as data paths and model tunings are also included. Example runs illustrate how to invoke scripts that execute model inferences.

NOTE: The initial version of this project is deployed with restrictions:

  1. The container can be deployed on any platform with Singularity or Docker; however, associated model checkpoints and statistics file are not included.
  2. In order to run the canned Python/Bash scripts, the user must log into the Discover cluster and execute the runtime scripts described below.
  3. All paths reflect a static Discover enviroment, referencing both fully-specified and relative paths to the input data.
  4. To change default parameters, a copy of the runtime scripts should be made by the user and modified accordingly.
  5. Scripts and configuration files, which are hard-coded with parameters that invoke a very specific Discover invocation, have typically originated in the separate Model projects and tweaked to run in this environment.
  6. Each FM is entirely independent and has a unique runtime signature.
  7. Output formats vary across FMs.
  8. Runtime assistance can be nominally supported by the development team, but FM model architecture expertise is not provided.

Objectives

  • Library to process FMs using GPU and CPU parallelization.
  • Machine Learning and Deep Learning inference applications.
  • Example scripts for a quick AI/ML start with your own data.

Contributors


User Guide

This User Guide reflects instructions for running inference scripts on Discover only.

Running QEFM Foundation Models Inference scripts

Allocate a GPU before running the inference scripts:

GPU Allocation (CLI)

salloc --gres=gpu:1 --mem=60G --time=1:00:00 --partition=gpu_a100 --constraint=rome --ntasks-per-node=1 --cpus-per-task=10

Command-Line Interface (CLI)

To run each of the Foundation Model tasks with qefm-core, change directories to the qefm-core root directory and run the inference script:

module load singularity
cd <Root directory>
./tests/fm-inference.sh <Container name> <Foundation Model name>

To run a specific Foundation Model task with qefm-core, use the following command:

./tests/fm-inference.sh <Container name> <Foundation model name> 

Common CLI Arguments

Command-line-argument Description Required/Optional/Flag Default Example
<Root directory> Path fo qefm-core installation Required N/A /discover/nobackup/projects/QEFM/qefm-core
<Container name> Name of Singularity container image (or sandbox) Required N/A qefm-core-all.sif
<Foundation Model name> Short title of Foundation Model Required N/A gencast,aifs,aurora, fourcastnet, graphcast, pangu, privthi, sfno

Examples

Navigate to Root directory on Discover:

cd /discover/nobackup/projects/QEFM/qefm-core

Run Inference for GenCast Foundation Model:

./tests/fm-inference.sh qefm-core-all.sif   gencast

Run Inference for AIFS Foundation Model:

./tests/fm-inference.sh qefm-core-all.sif   aifs

Run Inference for Aurora Foundation Model:

./tests/fm-inference.sh qefm-core-all.sif  aurora

Run Inference for Fourcastnet Foundation Model:

./tests/fm-inference.sh qefm-core-all.sif  fourcastnet

Run Inference for Pangu Foundation Model:

./tests/fm-inference.sh qefm-core-all.sif  pangu

Run Inference for Privthi Foundation Model:

./tests/fm-inference.sh qefm-core-all.sif  privthi

Run Inference for GraphCast Foundation Model:

./tests/fm-inference.sh qefm-core-all.sif  graphcast

Run Inference for SFNO Supported Foundation Model:

./tests/fm-inference.sh qefm-core-all.sif   sfno

Run Inference for All Foundation Models:

./tests/fm-inference.sh qefm-core-all.sif  ensemble

Runtime Notes:

Since Singularity caches the container when invoked, it is important to specify the location of this cache to avoid disk space limitations. If running on Discover, /lscratch is a handy spot to create a directory path to use as a cache. See the example below for setting the appropriate environment variables:

export APPTAINER_TMPDIR=/lscratch/tdirs/gt-scratch/.cache
export APPTAINER_CACHEDIR=/lscratch/tdirs/gt-scratch/.cache

About

Quantitative Evaluation of Foundation Models

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •