Skip to content

Releases: aws/aws-parallelcluster-cookbook

AWS ParallelCluster v3.14.0

30 Sep 12:13
2b959bc
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 3.14.0

This is associated with AWS ParallelCluster v3.14.0

ENHANCEMENTS

  • Include drivers for P6e-GB200 and P6-B200 instances. ParallelCluster sets up Slurm topology plugin to handle P6e-GB200 UltraServers. See limitations section for important additional setup requirements.
  • Support prioritized and capacity-optimized-prioritized Allocation Strategy. This allows users to prioritize subnets for instance placement to optimize costs and performance.
  • Add build-image support for Amazon Linux 2023 AMIs based on kernel 6.12 (in addition to 6.1).
  • Support DCV on Amazon Linux 2023.
  • Echo chef-client logs in the instance console when a node fails to bootstrap. This helps with investigating bootstrap failures in cases CloudWatch logs are not available.

LIMITATIONS

  • P6e-GB200 instances are only tested on Amazon Linux 2023, Ubuntu 22.04 and Ubuntu 24.04.
  • Using IMEX on P6e-GB200 requires additional setup. Please refer to the dedicated tutorial in our public documentation.
  • P6-B200 instances are only tested on Amazon Linux 2023, RHEL 8 & 9, Rocky 8 & 9, Ubuntu 22.04 and Ubuntu 24.04.
  • GPU HealthChecks are not recommended for instances with GPU memory above 320GB (such as p6-b200.48xlarge). Health check duration can exceed 10 minutes, potentially causing job failures and significantly reducing the job throughput.

CHANGES

  • Install nvidia-imex for all OSs except Amazon Linux 2.
  • Remove UnkillableStepTimeout from slurm.conf and let slurm set this value.
  • Upgrade Python runtime used by Lambda functions to Python 3.12 (from 3.9). See Lambda Documentation for important information about Python 3.9 EOL: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html
  • Support encryption of EFS file system used for the head node internal shared storage via a new configuration parameter HeadNode/SharedStorageEfsSettings/Encrypted
  • Add validator that warns against using non GPU instances with DCV.
  • Upgrade Slurm to version 24.11.6 (from 24.05.8).
  • Upgrade EFA installer to 1.43.2 (from 1.41.0).
    • Efa-driver: efa-2.17.2-1
    • Efa-config: efa-config-1.18-1
    • Efa-profile: efa-profile-1.7-1
    • Libfabric-aws: libfabric-aws-2.1.0-5
    • Rdma-core: rdma-core-58.0-1
    • Open MPI: openmpi40-aws-4.1.7-2 and openmpi50-aws-5.0.6-11
  • Upgrade Cinc Client to version 18.4.12 (from 18.2.7).
  • Upgrade NVIDIA driver to version 570.172.08 (from 570.86.15) for all OSs except Amazon Linux 2.
  • Upgrade CUDA Toolkit to version 12.8.1 (from 12.8.0) for all OSs except Amazon Linux 2.
  • Upgrade DCGM to version 4.4.1 (from 3.3.6) for all OSs except Amazon Linux 2.
  • Upgrade Python to 3.12.11 (from 3.12.8) for all OSs except Amazon Linux 2.
  • Upgrade Python to 3.9.23 (from 3.9.20) for Amazon Linux 2.
  • Upgrade Intel MPI Library to 2021.16.0 (from 2021.13.1).
  • Upgrade DCV to version 2024.0-19030.
  • Upgrade the official ParallelCluster Amazon Linux 2023 AMIs to kernel 6.12 (from 6.1).

BUG FIXES

  • Prevent build-image stack deletion failures by deploying a global role that automatically deletes the build-image stack after images either succeed or fail the build.
    The role is meant to exist even after the stack has been deleted. See aws/aws-parallelcluster#5914.
  • Fix an issue where Security Group validation failed when a rule contained both IPv4 ranges (IpRanges) and security group references (UserIdGroupPairs).
  • Fix build-image failure on Rocky 9, occurring when the parent image does not ship the latest kernel version on the latest Rocky minor version.
  • Fix cluster id mismatch issue which causes cluster update failures when slurm accounting is used.
  • Fix a race condition in CloudWatch Agent startup that could cause node bootstrap failures.

DEPRECATIONS

  • The configuration parameter LoginNodes/Pools/Ssh/KeyName has been deprecated, and it will be removed in future releases. The CLI now returns a warning message when it is used in the cluster configuration.
    See aws/aws-parallelcluster#6811.
  • Ubuntu 20.04 is no longer supported.

AWS ParallelCluster v3.13.2

24 Jun 21:39
9a143ec
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 3.13.2

This is associated with AWS ParallelCluster v3.13.2

3.13.2

BUG FIXES

  • Fix build-image failure on Rocky 9, occurring when the parent image does not ship the latest kernel version. See aws/aws-parallelcluster#6874.

AWS ParallelCluster v3.13.1

04 Jun 20:53
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 3.13.1

This is associated with AWS ParallelCluster v3.13.1

CHANGES

  • Upgrade Slurm to version 24.05.8.
  • Upgrade EFA installer to 1.41.0 (from 1.38.1).
    • Efa-driver: efa-2.15.0-1
    • Efa-config: efa-config-1.18-1
    • Efa-profile: efa-profile-1.7-1
    • Libfabric-aws: libfabric-aws-2.1.0-1
    • Rdma-core: rdma-core-57.0-1
    • Open MPI: openmpi40-aws-4.1.7-2 and openmpi50-aws-5.0.6
  • Upgrade amazon-efs-utils to version 2.3.1 (from v2.1.0) for non-Amazon Linux AMI's.

BUG FIXES

  • Fix an issue where we were not installing Pyxis if NVIDIA is already installed.
  • Fix a bug that was preventing the script 'update_directory_service_password.sh' from updating the AD password.

AWS ParallelCluster v3.13.0

01 Apr 20:39
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Node 3.13.0

This is associated with AWS ParallelCluster v3.13.0

ENHANCEMENTS

  • Add support for Ubuntu 24.04.
  • Disable unused services like cups and wpa_supplicant from Official ParallelCluster AMIs to improve security.

CHANGES

  • Upgrade Slurm to version 24.05.7.
  • Upgrade NVIDIA driver to version 570.86.15 (from 550.127.08) for all OSs except AL2.
  • Upgrade CUDA Toolkit to version 12.8.0 (from 12.4.1) for all OSs except AL2.
  • Upgrade Python to 3.12.8 for all OSs except AL2 (from 3.9.20).
  • On Ubuntu 22.04, install the Nvidia driver with the same compiler version used to compile the kernel.
  • Upgrade aws-cfn-bootstrap to version 2.0-33.
  • Upgrade EFA installer to 1.38.1 (from 1.36.0).
    • Efa-driver: efa-2.13.0-1
    • Efa-config: efa-config-1.17-1
    • Efa-profile: efa-profile-1.7-1
    • Libfabric-aws: libfabric-aws-1.22.0-1
    • Rdma-core: rdma-core-54.0-1
    • Open MPI: openmpi40-aws-4.1.7-1 and openmpi50-aws-5.0.5
  • Upgrade amazon-efs-utils to version 2.1.0.
  • Remove third-party cookbook: apt-7.5.22 and pyenv-4.2.3.
  • Upgrade third-party cookbook dependencies:
    • line-4.5.21 (from line-4.5.13)
    • nfs-5.1.5 (from nfs-5.1.2)
    • openssh-2.11.14 (from openssh-2.11.12)
    • yum-7.4.20 (from yum-7.4.13)
    • yum-epel-5.0.8 (from yum-epel-5.0.2)
  • Upgrade Pmix to 5.0.6 (from 5.0.3).
  • Upgrade ARM PL to version 24.10 (from 23.10).
  • Remove generation of DSA keys for login nodes as DSA, which became unsupported in OpenSSH 9.7+.
  • Set instance ID and instance type information in Slurm upon compute nodes launch.
  • Install NVIDIA drivers without the option 'no-cc-version-check', which is now deprecated in the NVIDIA installer.
  • Reduce RHEL/Rocky Linux boot time by the following network customization:
    • Configuring higher priority to IPv4 than IPv6
    • Disabling Internet connectivity check
    • Configuring only IPv4 IMDS endpoint to cloud-init

BUG FIXES

  • Remove usage of cfn-init for compute node bootstrapping to reduce node scale-up time.
  • On Ubuntu 22.04, install the Nvidia driver with the same compiler version used to compile the kernel
    to prevent installation failures.
  • Fix the execution of overriding aws-parallelcluster-node package only on the head node during update.
  • Fix an issue where containerized jobs executed through Pyxis/Enroot in a multi-user environment (integrated with Active Directory) would fail.
  • Fix usage of authselect causing node bootstrap failures on Rocky 9.5+ when directory service is used.

AWS ParallelCluster v3.12.0

18 Dec 22:09
b53aec2
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 3.12.0

This is associated with AWS ParallelCluster v3.12.0

ENHANCEMENTS

  • Extend Amazon DCV support to Ubuntu2204 on ARM instances.

CHANGES

  • Upgrade NVIDIA driver to version 550.127.08 (from 550.90.07). This addresses a known issue from Nivdia.
  • Upgrade Amazon DCV to version 2024.0-18131.
    • server: 2024.0-18131-1
    • xdcv: 2024.0.631-1
    • gl: 2024.0.1078-1
    • web_viewer: 2024.0-18131-1
  • Upgrade EFA installer to 1.36.0.
    • Efa-driver: efa-2.13.0-1
    • Efa-config: efa-config-1.17-1
    • Efa-profile: efa-profile-1.7-1
    • Libfabric-aws: libfabric-aws-1.22.0-1
    • Rdma-core: rdma-core-54.0-1
    • Open MPI: openmpi40-aws-4.1.7-1 and openmpi50-aws-5.0.5
  • Auto-restart slurmctld on failure.
  • Upgrade mysql-community-client to version 8.0.39.

BUG FIXES

  • Fix an issue in the way we get region when manage volumes so that it can correctly handle local zone.
  • Fix an issue where adding EFS filesystems with AccessPointIds during an update would fail.

AWS ParallelCluster v3.11.1

21 Oct 16:53
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 3.11.1

This is associated with AWS ParallelCluster v3.11.1

CHANGES

  • Pyxis is now disabled by default, so it must be manually enabled as documented in the product documentation.
  • Upgrade libjwt to version 1.17.0.

BUG FIXES

  • Fix an issue in the way we configure the Pyxis Slurm plugin in ParallelCluster that can lead to job submission failures.
    aws/aws-parallelcluster#6459

AWS ParallelCluster v3.11.0

25 Sep 20:43
b00a0c6
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 3.11.0

This is associated with AWS ParallelCluster v3.11.0

ENHANCEMENTS

  • Allow custom actions on login nodes.
  • Allow DCV connection on login nodes.
  • Add new attribute efs_access_point_ids to specify optional EFS access points for the mounts
  • Install enroot and pyxis in official pcluster AMIs

CHANGES

  • Upgrade Slurm to 23.11.10 (from 23.11.7).
  • Upgrade Pmix to 5.0.3 (from 5.0.2).
  • Upgrade EFA installer to 1.34.0.
    • Efa-driver: efa-2.10.0-1
    • Efa-config: efa-config-1.17-1
    • Efa-profile: efa-profile-1.7-1
    • Libfabric-aws: libfabric-aws-1.22.0-1
    • Rdma-core: rdma-core-52.0-1
    • Open MPI: openmpi40-aws-4.1.6-3 and openmpi50-aws-5.0.3-11
  • Upgrade NVIDIA driver to version 550.90.07 (from 535.183.01).
  • Upgrade CUDA Toolkit to version 12.4.1 (from 12.2.2).
  • Upgrade Python to 3.9.20 (from 3.9.19).
  • Upgrade Intel MPI Library to 2021.13.1.769 (from 2021.12.1.8).

BUG FIXES

  • Fix EFA kmod installation with RHEL 8.10 or newer.

AWS ParallelCluster v3.10.1

08 Jul 20:04
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 3.10.1

This is associated with AWS ParallelCluster v3.10.1

CHANGES

  • There were no changes for this version.

AWS ParallelCluster v3.10.0

27 Jun 21:42
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 3.10.0

This is associated with AWS ParallelCluster v3.10.0

ENHANCEMENTS

  • Add support for external Slurmdbd.
  • Allow build-image to be run in an isolated network.
  • Add support for Amazon Linux 2023.

CHANGES

  • Upgrade Cinc Client to version to 18.4.12 from 18.2.7.
  • Upgrade munge to version 0.5.16 (from 0.5.15).
  • Upgrade Pmix to 5.0.2 (from 4.2.9).
  • Upgrade third-party cookbook dependencies:
    • apt-7.5.22 (from apt-7.5.14)
    • openssh-2.11.12 (from openssh-2.11.3)
  • Remove third-party cookbook: selinux-6.1.12.
  • Upgrade EFA installer to 1.32.0.
    • Efa-driver: efa-2.8.0-1
    • Efa-config: efa-config-1.16-1
    • Efa-profile: efa-profile-1.7-1
    • Libfabric-aws: libfabric-aws-1.21.0-1
    • Rdma-core: rdma-core-50.0-1
    • Open MPI: openmpi40-aws-4.1.6-3 and openmpi50-aws-5.0.2-12
  • Upgrade NVIDIA driver to version 535.183.01 (from 535.154.05).
  • Upgrade Python to 3.9.19 (from 3.9.17).
  • Upgrade Intel MPI Library to 2021.12.1.8 (from 2021.9.0.43482).

BUG FIXES

  • Fix an issue that prevented cluster updates from including EFS filesystems with encryption in transit.
  • Fix an issue that prevented slurmctld and slurmdbd services from restarting on head node reboot when
    EFS is used for shared internal data.
  • On Ubuntu systems, remove default logrotate configuration for cloud-init log files that clashed with the
    configuration coming from Parallelcluster.
  • Removing /etc/profile.d/pcluster.sh so that it's not executed at every user login and
    cfn_bootstrap_virtualenv is not added in PATH environment variable.
  • Fix image build failure with RHEL 8.10 or newer.

AWS ParallelCluster v3.9.3

19 Jun 12:19
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster Cookbook 3.9.3

This is associated with AWS ParallelCluster v3.9.3

ENHANCEMENTS

  • Add support for FSx Lustre as a shared storage type in us-iso-east-1.

BUG FIXES

  • Remove cloud_dns from the SlurmctldParameters in the Slurm config to avoid Slurm fanout issues.
    This is also not required since we set the IP addresses on instance launch.