Difference between revisions of "HOWTO run jobs on the osg using the GlueX singularity container"

From GlueXWiki
Jump to: navigation, search
(How do I submit a job to run on OSG sites without singularity?)
(How do I submit a job to run on OSG sites without singularity?)
Line 49: Line 49:
 
RequestCPUs = 1
 
RequestCPUs = 1
 
Requirements = (HAS_CVMFS_oasis_opensciencegrid_org =?= True) && (HAS_CVMFS_singularity_opensciencegrid_org =?= True)
 
Requirements = (HAS_CVMFS_oasis_opensciencegrid_org =?= True) && (HAS_CVMFS_singularity_opensciencegrid_org =?= True)
queue 1000
+
queue 10
 
</nowiki>
 
</nowiki>

Revision as of 21:56, 28 May 2018

What is the Gluex singularity container?

The Gluex singularity container replicates your local working environment on the Jlab CUE, including database files, executable binaries, libraries, and system packages on the remote site where your job runs. Singularity is an implementation of the "Linux container" concept that allows a user to bundle up applications, libraries, and the entire system directory structure that describes how you work on one system, and move the entire thing as a unit to another host where it can be started up as if it were running in the original context. In some ways this is similar to virtualization (eg. VirtualBox), except that it does not suffer from the inefficiencies of virtualization. From points of view of both computation speed and memory resources, processes running inside a Singuarity container are just as efficient as if you were to rebuild and run them in the local OS environment.

How do I submit a job to run in the container?

Here is an example job submission script for the OSG that uses the GlueX Singularity container maintained by Mark Ito on OSG network storage resources. These OSG storage resources are visible on Jlab machine scosg16.jlab.org at mount point /cvmfs. You can see the path under /cvmfs to the GlueX singularity container on the line SingularityImage in the submit script below. You should change the name of the osg proxy certificate in the line x509userproxy to point to your own proxy certificate, and make sure it has several hours left on it before you submit your jobs using condor_submit on scosg16. The local directory osg.d (or something you name yourself) should be created in your work directory, preferably under /osgpool/halld/userid, to receive the stdout and stderr logs from your jobs.

scosg16.jlab.org> cat my_osg_job.sub
executable = osg-container.sh
output = osg.d/stdout.$(PROCESS)
error = osg.d/stderr.$(PROCESS)
log = tpolsim_osg.log
notification = never
universe = vanilla
arguments = bash tpolsim_osg.bash $(PROCESS)
should_transfer_files = yes
x509userproxy = /tmp/x509up_u7896
transfer_input_files = tpolsim_osg.bash,setup_osg.sh,control.in0,control.in1,postconv.py,postsim.py
WhenToTransferOutput = ON_EXIT
on_exit_remove = (ExitBySignal == False) && (ExitCode == 0)
on_exit_remove = true
RequestCPUs = 1
Requirements = HAS_SINGULARITY == True
+SingularityImage = "/cvmfs/singularity.opensciencegrid.org/markito3/gluex_docker_devel:latest"
+SingularityBindCVMFS = True
queue 10

The script osg-container.sh starts up the container and makes sure that it has all of the environment settings configured for the /group/halld work environment. Fetch a copy of this script from the git repository https://github.com/rjones30/gluex-osg-jobscripts.git and customize it for your own work. This script is nice because you can run it on scosg16 from your regular login shell and it will give you the same environment as would exist when your job starts on a remote osg site. For example, "./osg-container.sh bash" starts up a bash shell inside the container, where you can execute local GlueX commands as if you were on the ifarm -- apart from access to the central data areas and /cache of course.

How do I submit a job to run on OSG sites without singularity?

Singularity simplifies the writing and debugging of osg job scripts by allowing them to run in an environment that closely mimics the Jlab CUE. However, a significant fraction of the resources on the OSG do not have singularity installed, including in particular a number of university clusters that provide opportunistic cycles to OSG users. As more OSG opportunistic jobs come to require singularity, an opportunity arises to gain access to a collection of under-subscribed sites by submitting jobs that do not require singularity. This section explains how to take your existing OSG workflow based on the Gluex singularity container and allow it to run on sites that do not have singularity installed, by making a couple simple changes to the submit file and job script. The example below illustrates the minimal changes to the above container-based osg job submission script that will enable it to run on non-singularity osg resources.

scosg16.jlab.org> cat my_nosg_job.sub
executable = osg-nocontainer.sh
output = osg0.d/stdout.$(PROCESS)
error = osg0.d/stderr.$(PROCESS)
log = tpolsim_osg.log
notification = never
universe = vanilla
arguments = bash tpolsim_osg.bash $(PROCESS)
should_transfer_files = yes
x509userproxy = /tmp/x509up_u7896
transfer_input_files = osg-nocontainer_2.29_jlab.env,osg-container-helper.sh,tpolsim_osg.bash,setup_osg.sh,control.in0,control.in1,postconv.py,postsim.py
WhenToTransferOutput = ON_EXIT
on_exit_remove = (ExitBySignal == False) && (ExitCode == 0)
on_exit_hold = false
RequestCPUs = 1
Requirements = (HAS_CVMFS_oasis_opensciencegrid_org =?= True) && (HAS_CVMFS_singularity_opensciencegrid_org =?= True)
queue 10