Difference between revisions of "HOWTO use AmpTools on the JLab farm with MPI"

From GlueXWiki
Jump to: navigation, search
(AmpTools Compilation MPI)
(Submitting Batch Jobs)
Line 42: Line 42:
  
 
=== Submitting Batch Jobs ===
 
=== Submitting Batch Jobs ===
 +
To submit an MPI enabled job to the JLab farm we can use the slurm scheduler system directly.
 +
One way to do this, is using a script that contains some slurm commands and tells the scheduler about how to run your job.
 +
 +
Example for a slurm submit script:
 +
#!/bin/bash -l
 +
#SBATCH -A halld
 +
#SBATCH -p ifarm
 +
#SBATCH -t 1-00:00:00
 +
#SBATCH -J FITNAME
 +
#SBATCH --mail-type=begin        # send email when job begins
 +
#SBATCH --mail-type=end          # send email when job ends
 +
#SBATCH --mail-type=fail        # send email if job fails
 +
#SBATCH --mail-user=USER@jlab.org
 +
#SBATCH --ntasks=100
 +
/usr/local/openmpi/openmpi-4.0.1/bin/mpirun --mca btl_openib_allow_ib 1 fitMPI -c YOURCONFIG.cfg < -s YOURCONFIG_params.dat -m 50000 !>& YOURCONFIG.log
 +
 +
You can copy and paste the above lines to a text file called for example '''submit.sh'''.
 +
Remember to replace '''FITNAME''', '''USER''' and '''YOURCONFIG'''.
 +
 +
This script is able to send an email to your jlab email address when the job starts, fails and ends. If you do not want to make use of this option, remove the corresponding lines from the submit script.
 +
 +
In this particular example, 100 processes are started. These will be spawned by the slurm scheduler on arbitrary nodes as they are available on the farm.
 +
 +
Furthermore, it submits the job to the slurm partition called '''ifarm'''. If you want to submit to a different partition, modify the corresponding option in the script.
 +
 +
The last line of the script is the actual command that is executed and requires that the '''fitMPI''' executable is compiled and running within your environment. There are some options passed to the fitMPI program, like saving the final parameters in a text file (-s option) or setting explicitly the number of Minuit calls to 50000 using the option -m. These options should be removed or modified if necessary.
 +
 +
When you have adjusted the options to your needs, submit the job to the batch system using the command
 +
sbatch <path/to/your/script/>submit.sh
 +
 +
After submission, you can use standard slurm commands to check and control your fit.
 +
To check the status of all your jobs:
 +
squeue -u USER
 +
 +
In case something goes wrong, you can terminate all your jobs with
 +
scancel -u USER
 +
or a specific job with
 +
scancel JOBID

Revision as of 10:15, 13 January 2022

Load MPI module

On the ifarm you can load the MPI module with

module load mpi/openmpi-4.0.1

providing the binaries below to compile (mpicxx) and run (mpirun) MPI commands

which mpicxx
which mpirun

AmpTools Compilation MPI

This example was done in csh on ifarm1901

1) Download latest AmpTools release

git clone git@github.com:mashephe/AmpTools.git

2) Set AMPTOOLS directory

setenv AMPTOOLS_HOME $PWD/AmpTools/
setenv AMPTOOLS $AMPTOOLS_HOME/AmpTools/

3) Put root-config in your path (assumes ROOTSYS set by some other setup script)

setenv PATH $ROOTSYS/bin:$PATH

4) Build main AmpTools library with MPI support (temporary branch to support openmpi version 4 on ifarm)

cd $AMPTOOLS/AmpTools
make MPI=1

Fitter Compilation with MPI

The only MPI dependent part of halld_sim is fitMPI.cc is an optional build for MPI fits, analogous to the usual fit.cc without MPI. You can build the fitMPI executable with the following commands (requires git pull of the halld_sim master branch after 1/13/22)

cd $HALLD_SIM_HOME/src/programs/AmplitudeAnalysis/fitMPI/
scons -u install

Performing Fits Interactively

The fitMPI executable is run with mpirun

mpirun N fitMPI -c YOURCONFIG.cfg

where N is the number of parallel processes to use in the fit and YOURCONFIG.cfg is your usual config file. Note: additional command line parameters can be used as well, as needed.

Submitting Batch Jobs

To submit an MPI enabled job to the JLab farm we can use the slurm scheduler system directly. One way to do this, is using a script that contains some slurm commands and tells the scheduler about how to run your job.

Example for a slurm submit script:

#!/bin/bash -l
#SBATCH -A halld
#SBATCH -p ifarm
#SBATCH -t 1-00:00:00
#SBATCH -J FITNAME
#SBATCH --mail-type=begin        # send email when job begins
#SBATCH --mail-type=end          # send email when job ends
#SBATCH --mail-type=fail         # send email if job fails
#SBATCH --mail-user=USER@jlab.org
#SBATCH --ntasks=100
/usr/local/openmpi/openmpi-4.0.1/bin/mpirun --mca btl_openib_allow_ib 1 fitMPI -c YOURCONFIG.cfg < -s YOURCONFIG_params.dat -m 50000 !>& YOURCONFIG.log

You can copy and paste the above lines to a text file called for example submit.sh. Remember to replace FITNAME, USER and YOURCONFIG.

This script is able to send an email to your jlab email address when the job starts, fails and ends. If you do not want to make use of this option, remove the corresponding lines from the submit script.

In this particular example, 100 processes are started. These will be spawned by the slurm scheduler on arbitrary nodes as they are available on the farm.

Furthermore, it submits the job to the slurm partition called ifarm. If you want to submit to a different partition, modify the corresponding option in the script.

The last line of the script is the actual command that is executed and requires that the fitMPI executable is compiled and running within your environment. There are some options passed to the fitMPI program, like saving the final parameters in a text file (-s option) or setting explicitly the number of Minuit calls to 50000 using the option -m. These options should be removed or modified if necessary.

When you have adjusted the options to your needs, submit the job to the batch system using the command

sbatch <path/to/your/script/>submit.sh

After submission, you can use standard slurm commands to check and control your fit. To check the status of all your jobs:

squeue -u USER

In case something goes wrong, you can terminate all your jobs with

scancel -u USER

or a specific job with

scancel JOBID