Difference between revisions of "HOWTO run jobs on the osg using the GlueX singularity container"

From GlueXWiki
Jump to: navigation, search
(What is the Gluex singularity container?)
Line 1: Line 1:
 
===What is the Gluex singularity container?===
 
===What is the Gluex singularity container?===
A major hurdle you must overcome in order to get your GlueX simulation and analysis scripts to run on the Open Science Grid is how to replicate your local working environment consisting of database files, executable binaries, libraries, and system packages on the remote site where your job runs. A simple and elegant solution to this is to use the GlueX singularity container. Singularity is an implementation of the "Linux container" concept that allows a user to bundle up applications, libraries, and the entire system directory structure that describes how you work on one system, and move the entire thing to another host where it can be started up as if it were running in the original context. In some ways this is similar to virtualization (eg. VirtualBox), except that it does not suffer from the inefficiencies of virtualization. From both the points of view of computation speed and memory resource allocation, processes running inside a Singuarity container are just as efficient as if you were to rebuild and run them in the local system context.
+
The Gluex singularity container replicates your local working environment on the Jlab CUE, including database files, executable binaries, libraries, and system packages on the remote site where your job runs. Singularity is an implementation of the "Linux container" concept that allows a user to bundle up applications, libraries, and the entire system directory structure that describes how you work on one system, and move the entire thing as a unit to another host where it can be started up as if it were running in the original context. In some ways this is similar to virtualization (eg. VirtualBox), except that it does not suffer from the inefficiencies of virtualization. From points of view of both computation speed and memory resources, processes running inside a Singuarity container are just as efficient as if you were to rebuild and run them in the local OS environment.
  
 
===How do I submit a job to run in the container?===
 
===How do I submit a job to run in the container?===

Revision as of 21:14, 28 May 2018

What is the Gluex singularity container?

The Gluex singularity container replicates your local working environment on the Jlab CUE, including database files, executable binaries, libraries, and system packages on the remote site where your job runs. Singularity is an implementation of the "Linux container" concept that allows a user to bundle up applications, libraries, and the entire system directory structure that describes how you work on one system, and move the entire thing as a unit to another host where it can be started up as if it were running in the original context. In some ways this is similar to virtualization (eg. VirtualBox), except that it does not suffer from the inefficiencies of virtualization. From points of view of both computation speed and memory resources, processes running inside a Singuarity container are just as efficient as if you were to rebuild and run them in the local OS environment.

How do I submit a job to run in the container?

Here is an example job submission script for the OSG that uses the GlueX Singularity container that is maintained by Mark Ito on an OSG network storage resource called cernvm (mount point /cvmfs on scosg16.jlab.org). You can see the path to the GlueX singularity container on the line SingularityImage in the submit script below. You should change the name of the osg proxy certificate in the line x509userproxy to point to your own proxy certificate, and make sure it has several hours left on it before you submit your job using condor_submit. The local directory osg.d (or something you name yourself) should be created in your home directory to receive the stdout and stderr logs from your jobs.

scosg16.jlab.org> cat my_osg_job.sub
executable = osg-container.sh
output = osg.d/stdout.$(PROCESS)
error = osg.d/stderr.$(PROCESS)
log = tpolsim_osg.log
notification = never
universe = vanilla
arguments = bash tpolsim_osg.bash $(PROCESS) 100000
should_transfer_files = yes
x509userproxy = /tmp/x509up_u7896
transfer_input_files = tpolsim_osg.bash,setup_osg.sh,control.in0,control.in1,pos
tconv.py,postsim.py
WhenToTransferOutput = ON_EXIT
on_exit_remove = (ExitBySignal == False) && (ExitCode == 0)
on_exit_remove = true
RequestCPUs = 1
Requirements = HAS_SINGULARITY == True
+SingularityImage = "/cvmfs/singularity.opensciencegrid.org/markito3/gluex_docker_devel:latest"
+SingularityBindCVMFS = True
queue 10

The script osg-container.sh starts up the container and makes sure that it has all of the environment settings configured for the /group/halld work environment. Fetch a copy of this script from the git repository https://github.com/rjones30/gluex-osg-jobscripts.git and customize it for your own work. This script is nice because you can run it on scosg16 from your regular login shell and it will give you the same environment as would exist when your job starts on a remote osg site. For example, "./osg-container.sh bash" starts up a bash shell inside the container, where you can execute local GlueX commands as if you were on the ifarm -- apart from access to the central data areas and /cache of course.