Difference between revisions of "HOWTO use AmpTools on the JLab farm GPUs"
m (→AmpTools Compilation with CUDA) |
m (→Combining GPU and MPI) |
||
(29 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
=== Access through SLURM === | === Access through SLURM === | ||
− | JLab currently provides | + | JLab currently provides NVidia Titan RTX or T4 cards on the sciml19 an sciml21 nodes and 4 NVidia A100 (80G ram) cards on each of the two sciml23 nodes. The nodes can be accessed through SLURM, where N is the number of requested cards (1-4): |
− | >salloc --gres gpu:TitanRTX:N --partition gpu --nodes 1 | + | >salloc --gres gpu:TitanRTX:N --partition gpu --nodes 1 --mem-per-cpu=4G |
+ | or | ||
+ | >salloc --gres gpu:T4:N --partition gpu --nodes 1 --mem-per-cpu=4G | ||
+ | or | ||
+ | >salloc --gres gpu:A100:N --partition gpu --nodes 1 --mem-per-cpu=4G | ||
+ | The default memory request is 512MB per CPU, which is often too small. | ||
+ | |||
An interactive shell (e.g. bash) on the node with requested allocation can be opened with srun: | An interactive shell (e.g. bash) on the node with requested allocation can be opened with srun: | ||
>srun --pty bash | >srun --pty bash | ||
Line 28: | Line 34: | ||
=== AmpTools Compilation with CUDA === | === AmpTools Compilation with CUDA === | ||
− | This example was done in csh for the Titan RTX cards on sciml1902. | + | This example was done in csh for the Titan RTX cards available on sciml1902.<br> |
+ | '''The compilation does not have to be performed on a machine with GPUs. We chose the interactive node ifarm1901 here.''' | ||
'''1)''' Download latest AmpTools release | '''1)''' Download latest AmpTools release | ||
− | + | git clone git@github.com:mashephe/AmpTools.git | |
− | '''2)''' | + | '''2)''' Set AMPTOOLS directory |
− | + | setenv AMPTOOLS_HOME $PWD/AmpTools/ | |
+ | setenv AMPTOOLS $AMPTOOLS_HOME/AmpTools/ | ||
− | '''3)''' Load cuda environment module | + | '''3)''' Load cuda environment module (source <code>/etc/profile.d/modules.csh</code> before if you can't find the <code>module</code> command) |
module add cuda | module add cuda | ||
− | setenv CUDA_INSTALL_PATH / | + | setenv CUDA_INSTALL_PATH /apps/cuda/11.4.2/ |
− | + | With the advent of AlmaLinux9 at JLab, the modules were moved from /apps to /cvmfs: | |
+ | module use /cvmfs/oasis.opensciencegrid.org/jlab/scicomp/sw/el9/modulefiles | ||
+ | module load cuda | ||
+ | setenv CUDA_INSTALL_PATH /cvmfs/oasis.opensciencegrid.org/jlab/scicomp/sw/el9/cuda/12.2.2/ | ||
'''4)''' Set AMPTOOLS directory | '''4)''' Set AMPTOOLS directory | ||
setenv AMPTOOLS $PWD/AmpTools | setenv AMPTOOLS $PWD/AmpTools | ||
Line 46: | Line 57: | ||
setenv PATH $ROOTSYS/bin:$PATH | setenv PATH $ROOTSYS/bin:$PATH | ||
− | '''6)''' | + | '''6)''' Set the appropriate architecture for the cuda complier (info e.g. [https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ here]) |
+ | setenv GPU_ARCH sm_75 (for T4 and TitanRTX) | ||
+ | setenv GPU_ARCH sm_80 (for A100) | ||
+ | setenv GPU_ARCH sm_86 (for A800) | ||
+ | For older (pre 0.13) versions of AmpTools you will edit the Makefile and adjust the line: | ||
CUDA_FLAGS := -m64 -arch=sm_75 | CUDA_FLAGS := -m64 -arch=sm_75 | ||
'''7)''' Build main AmpTools library with GPU support | '''7)''' Build main AmpTools library with GPU support | ||
− | cd $ | + | cd $AMPTOOLS_HOME |
− | make GPU= | + | make gpu |
+ | |||
+ | === halld_sim Compilation with GPU === | ||
+ | |||
+ | The GPU dependent part of halld_sim is libraries/AMPTOOLS_AMPS/ where the GPU kernels are located. With the environment setup above the full halld_sim should be compiled, which will recognize the AMPTOOLS GPU flag and build the necessary libraries and executables to be run on the GPU | ||
+ | cd $HALLD_SIM_HOME/src/ | ||
+ | scons -u install -j8 | ||
=== Performing Fits Interactively === | === Performing Fits Interactively === | ||
− | === Submitting Batch Jobs === | + | With the environment setup above, the fit executable is run the same as on a CPU |
+ | fit -c YOURCONFIG.cfg | ||
+ | where YOURCONFIG.cfg is your usual config file. Note: additional command line parameters can be used as well, as needed. | ||
+ | |||
+ | == Combining GPU and MPI == | ||
+ | |||
+ | To utilize multiple GPUs in the same fit you'll need both the AmpTools and halld_sim libraries to be compiled with GPU and MPI support. To complete the steps below you'll need to be logged into one of the sciml nodes with GPU support (as described above). More about MPI can be found here: [[HOWTO use AmpTools on the JLab farm with MPI]] | ||
+ | |||
+ | === AmpTools === | ||
+ | |||
+ | Build the main AmpTools library with GPU and MPI support (note "mpigpu" option). If you are missing mpicxx you can load it using "module load mpi/openmpi3-x86_64" | ||
+ | cd $AMPTOOLS_HOME | ||
+ | make mpigpu | ||
+ | |||
+ | === halld_sim === | ||
+ | |||
+ | With the environment setup above the fitMPI executable is the only thing that needs to be recompiled, which will recognize the AmpTools GPU and MPI flag and build the necessary libraries and executables to be run on the GPU with MPI | ||
+ | cd $HALLD_SIM_HOME/src/programs/AmplitudeAnalysis/fitMPI/ | ||
+ | scons -u install | ||
+ | |||
+ | === Performing Fits Interactively === | ||
+ | |||
+ | The fitMPI executable is run with mpirun the same as on a CPU | ||
+ | |||
+ | mpirun fitMPI -c YOURCONFIG.cfg | ||
+ | |||
+ | If you're using Slurm it will recognize how many GPUs you've reserved and assign the number of parallel processes to make use of those GPUs. | ||
+ | |||
+ | == Submitting Batch Jobs == | ||
+ | This example script can be submitted via slurm using the <code>sbatch</code> command. WORKDIR has to be replaced with the full path to an existing directory, that includes the configuration file FILE.cfg . ENV.csh has to be replaced with a shell setup script. | ||
+ | |||
+ | #!/bin/csh | ||
+ | #SBATCH --nodes=1 | ||
+ | #SBATCH --partition=gpu | ||
+ | #SBATCH --gres=gpu:A100:1 | ||
+ | #SBATCH --cpus-per-task=1 | ||
+ | #SBATCH --ntasks-per-core=1 | ||
+ | #SBATCH --threads-per-core=1 | ||
+ | #SBATCH --mem=10GB | ||
+ | #SBATCH --time=8:00:00 | ||
+ | #SBATCH --ntasks=2 | ||
+ | #SBATCH --chdir=WORKDIR | ||
+ | #SBATCH --error=WORKDIR/log/fit.err | ||
+ | #SBATCH --output=WORKDIR/log/fit.out | ||
+ | #SBATCH --job-name=MyGPUfit | ||
+ | |||
+ | source ENV.csh | ||
+ | fit -c WORKDIR/FILE.cfg -m 1000000 -r 10 |
Revision as of 13:22, 29 April 2024
Contents
Access through SLURM
JLab currently provides NVidia Titan RTX or T4 cards on the sciml19 an sciml21 nodes and 4 NVidia A100 (80G ram) cards on each of the two sciml23 nodes. The nodes can be accessed through SLURM, where N is the number of requested cards (1-4):
>salloc --gres gpu:TitanRTX:N --partition gpu --nodes 1 --mem-per-cpu=4G
or
>salloc --gres gpu:T4:N --partition gpu --nodes 1 --mem-per-cpu=4G
or
>salloc --gres gpu:A100:N --partition gpu --nodes 1 --mem-per-cpu=4G
The default memory request is 512MB per CPU, which is often too small.
An interactive shell (e.g. bash) on the node with requested allocation can be opened with srun:
>srun --pty bash
Information about the cards, cuda version and usage is displayed with this command:
>nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN RTX Off | 00000000:3E:00.0 Off | N/A | | 41% 27C P8 2W / 280W | 0MiB / 24190MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
AmpTools Compilation with CUDA
This example was done in csh for the Titan RTX cards available on sciml1902.
The compilation does not have to be performed on a machine with GPUs. We chose the interactive node ifarm1901 here.
1) Download latest AmpTools release
git clone git@github.com:mashephe/AmpTools.git
2) Set AMPTOOLS directory
setenv AMPTOOLS_HOME $PWD/AmpTools/ setenv AMPTOOLS $AMPTOOLS_HOME/AmpTools/
3) Load cuda environment module (source /etc/profile.d/modules.csh
before if you can't find the module
command)
module add cuda setenv CUDA_INSTALL_PATH /apps/cuda/11.4.2/
With the advent of AlmaLinux9 at JLab, the modules were moved from /apps to /cvmfs:
module use /cvmfs/oasis.opensciencegrid.org/jlab/scicomp/sw/el9/modulefiles module load cuda setenv CUDA_INSTALL_PATH /cvmfs/oasis.opensciencegrid.org/jlab/scicomp/sw/el9/cuda/12.2.2/
4) Set AMPTOOLS directory
setenv AMPTOOLS $PWD/AmpTools
5) Put root-config in your path
setenv PATH $ROOTSYS/bin:$PATH
6) Set the appropriate architecture for the cuda complier (info e.g. here)
setenv GPU_ARCH sm_75 (for T4 and TitanRTX) setenv GPU_ARCH sm_80 (for A100) setenv GPU_ARCH sm_86 (for A800)
For older (pre 0.13) versions of AmpTools you will edit the Makefile and adjust the line:
CUDA_FLAGS := -m64 -arch=sm_75
7) Build main AmpTools library with GPU support
cd $AMPTOOLS_HOME make gpu
halld_sim Compilation with GPU
The GPU dependent part of halld_sim is libraries/AMPTOOLS_AMPS/ where the GPU kernels are located. With the environment setup above the full halld_sim should be compiled, which will recognize the AMPTOOLS GPU flag and build the necessary libraries and executables to be run on the GPU
cd $HALLD_SIM_HOME/src/ scons -u install -j8
Performing Fits Interactively
With the environment setup above, the fit executable is run the same as on a CPU
fit -c YOURCONFIG.cfg
where YOURCONFIG.cfg is your usual config file. Note: additional command line parameters can be used as well, as needed.
Combining GPU and MPI
To utilize multiple GPUs in the same fit you'll need both the AmpTools and halld_sim libraries to be compiled with GPU and MPI support. To complete the steps below you'll need to be logged into one of the sciml nodes with GPU support (as described above). More about MPI can be found here: HOWTO use AmpTools on the JLab farm with MPI
AmpTools
Build the main AmpTools library with GPU and MPI support (note "mpigpu" option). If you are missing mpicxx you can load it using "module load mpi/openmpi3-x86_64"
cd $AMPTOOLS_HOME make mpigpu
halld_sim
With the environment setup above the fitMPI executable is the only thing that needs to be recompiled, which will recognize the AmpTools GPU and MPI flag and build the necessary libraries and executables to be run on the GPU with MPI
cd $HALLD_SIM_HOME/src/programs/AmplitudeAnalysis/fitMPI/ scons -u install
Performing Fits Interactively
The fitMPI executable is run with mpirun the same as on a CPU
mpirun fitMPI -c YOURCONFIG.cfg
If you're using Slurm it will recognize how many GPUs you've reserved and assign the number of parallel processes to make use of those GPUs.
Submitting Batch Jobs
This example script can be submitted via slurm using the sbatch
command. WORKDIR has to be replaced with the full path to an existing directory, that includes the configuration file FILE.cfg . ENV.csh has to be replaced with a shell setup script.
#!/bin/csh #SBATCH --nodes=1 #SBATCH --partition=gpu #SBATCH --gres=gpu:A100:1 #SBATCH --cpus-per-task=1 #SBATCH --ntasks-per-core=1 #SBATCH --threads-per-core=1 #SBATCH --mem=10GB #SBATCH --time=8:00:00 #SBATCH --ntasks=2 #SBATCH --chdir=WORKDIR #SBATCH --error=WORKDIR/log/fit.err #SBATCH --output=WORKDIR/log/fit.out #SBATCH --job-name=MyGPUfit source ENV.csh fit -c WORKDIR/FILE.cfg -m 1000000 -r 10