HOWTO run jobs on the osg using the GlueX singularity container

From GlueXWiki
Revision as of 21:18, 28 May 2018 by Jonesrt (Talk | contribs) (How do I submit a job to run in the container?)

Jump to: navigation, search

What is the Gluex singularity container?

The Gluex singularity container replicates your local working environment on the Jlab CUE, including database files, executable binaries, libraries, and system packages on the remote site where your job runs. Singularity is an implementation of the "Linux container" concept that allows a user to bundle up applications, libraries, and the entire system directory structure that describes how you work on one system, and move the entire thing as a unit to another host where it can be started up as if it were running in the original context. In some ways this is similar to virtualization (eg. VirtualBox), except that it does not suffer from the inefficiencies of virtualization. From points of view of both computation speed and memory resources, processes running inside a Singuarity container are just as efficient as if you were to rebuild and run them in the local OS environment.

How do I submit a job to run in the container?

Here is an example job submission script for the OSG that uses the GlueX Singularity container maintained by Mark Ito on OSG network storage resources. These OSG storage resources are visible on Jlab machine scosg16.jlab.org at mount point /cvmfs. You can see the path under /cvmfs to the GlueX singularity container on the line SingularityImage in the submit script below. You should change the name of the osg proxy certificate in the line x509userproxy to point to your own proxy certificate, and make sure it has several hours left on it before you submit your jobs using condor_submit on scosg16. The local directory osg.d (or something you name yourself) should be created in your work directory, preferably under /osgpool/halld/userid, to receive the stdout and stderr logs from your jobs.

scosg16.jlab.org> cat my_osg_job.sub
executable = osg-container.sh
output = osg.d/stdout.$(PROCESS)
error = osg.d/stderr.$(PROCESS)
log = tpolsim_osg.log
notification = never
universe = vanilla
arguments = bash tpolsim_osg.bash $(PROCESS) 100000
should_transfer_files = yes
x509userproxy = /tmp/x509up_u7896
transfer_input_files = tpolsim_osg.bash,setup_osg.sh,control.in0,control.in1,pos
tconv.py,postsim.py
WhenToTransferOutput = ON_EXIT
on_exit_remove = (ExitBySignal == False) && (ExitCode == 0)
on_exit_remove = true
RequestCPUs = 1
Requirements = HAS_SINGULARITY == True
+SingularityImage = "/cvmfs/singularity.opensciencegrid.org/markito3/gluex_docker_devel:latest"
+SingularityBindCVMFS = True
queue 10

The script osg-container.sh starts up the container and makes sure that it has all of the environment settings configured for the /group/halld work environment. Fetch a copy of this script from the git repository https://github.com/rjones30/gluex-osg-jobscripts.git and customize it for your own work. This script is nice because you can run it on scosg16 from your regular login shell and it will give you the same environment as would exist when your job starts on a remote osg site. For example, "./osg-container.sh bash" starts up a bash shell inside the container, where you can execute local GlueX commands as if you were on the ifarm -- apart from access to the central data areas and /cache of course.