Difference between revisions of "Using the Grid"

From GlueXWiki
Jump to: navigation, search
(Accessing stored data over SRM)
(Accessing stored data over SRM)
Line 30: Line 30:
 
<pre>
 
<pre>
 
srmcp srm://grinch.phys.uconn.edu/Gluex/dc1.1-12-2012/dana_rest_1000596.hddm \
 
srmcp srm://grinch.phys.uconn.edu/Gluex/dc1.1-12-2012/dana_rest_1000596.hddm \
            file:///`pwd`/dana_rest_1000596.hddm
+
      file:///`pwd`/dana_rest_1000596.hddm
 
</pre>
 
</pre>

Revision as of 00:18, 30 December 2012

A Quick Start Howto for GlueX Members

Richard Jones, University of Connecticut


The GlueX collaboration is registered as a Virtual Organization (VO) within the Open Science Grid (OSG). One requirement for all VO's is that someone within the collaboration be the primary point of support for the members of the VO, and provide both documentation and assistance in problem-solving for users of grid resources and tools within the collaboration. I am that point of support for GlueX members, for the period leading up to the start of physics data taking. In return for the support I provide to Gluex VO users, I have direct access to experts within the OSG central support to help with issues that are beyond my control or expertise. This document is only the starting point for my support to GlueX members. Feedback on the accuracy, organization, and general usefulness of this page will be appreciated.

Who should read this document?

This document was written for those directly involved in the production of large-scale physics simulation data sets, and the end-users who want easy access to these data for carrying out physics analyses. In the past this has primarily been graduate students. Blake Leverington has written a detailed beginner's guide based on his experience, and Jake Bennett has provided a separate instructions page addressing issues he encountered that were not covered by Blake. Those pages will be useful to those who find this document assumes too much background knowledge, or uses too much grid jargon. If you just want to get started with a minimum of verbage, and don't need every term defined and concept explained, this is the place to start.

Preparation

Before you can start to use the grid, you need to install two objects in your login environment: your personal grid certificate, and the OSG Client Software bundle. The OSG Client is already installed for your use on the Jlab ifarm machines in the /apps area. Simply source /apps/osg/osg-client/setup.[c]sh to include the tools in your path. Personally I find that there are things in the OSG Client that mess with the standard Jlab CUE environment (like its own release of the python interpreter) so I set up a one-line shell script called "osg" that I put in my ~/bin directory, so that I can type the command "osg <command> <args>..." and have <command> see the OSG Client tools ahead of everything else in my path.

Installing the OSG Client bundle on your own Linux/Mac desktop is a good idea, but it is beyond the scope of this document. I have collected a set of instructions for installing the grid tools on Mac OS/X from various sources on the web. The OSG Grid Operations Center (GOC) distributes a standard set of RPM's that make it extremely simple to install the OSG Client on a Linux desktop. Root access is required. The OSG Client is not presently available for Windows desktops.

If you have never had a grid certificate, go to this separate page for instructions on how to obtain one and have it registered with the Gluex VO. The certificate itself is public, but the private key that comes with it should be carefully guarded and protected with a secure password. When the two are bundled together in a single file encrypted with a password, it is called a 'PKCS12' file (file extension .p12, or sometimes .pfx on Windows). Import this file into your favorite browser(s) so that you can use it to authenticate yourself to secure web services when needed, and into your email client so you can use it to attach your own digital signature to email messages when requested. However, the most important use of your certificate will be when you use it to generate a proxy. For that to work, save a copy of your .p12 file at ~/.globus/usercred.p12 in your home directory on your linux/Mac work machine.

A proxy is an authorization from a grid security hub that identifies you as a valid member of the Gluex VO and authorizes you for access to grid resources. You can do nothing on the grid without your proxy. The proxy resides within your unix (or Mac OS/X) shell environment, and gets picked up by grid tools and passed around the network together with your service requests. Generate your proxy with a command like the following.

$ voms-proxy-init -voms Gluex:/Gluex -valid 24:0

The VOMS (VO Membership Service) is a grid service that knows about the Gluex VO. Note that Gluex (lowercase x) is an OSG entity, whereas GlueX is the scientific collaboration that owns it. The string Gluex:/Gluex identifies your VO (Gluex:) and within Gluex asks only for basic rights as a member (/Gluex instead of something longer like /Gluex/production or /Gluex/software/role=admin). The validity lifetime string 24:0 means that the proxy will be good for 24 hours and 0 minutes, starting from when it is approved. It should take less than 5 seconds to create or renew a proxy with the above command. You can renew it at any time. For safety, when you are done with it you can destroy it with the command voms-proxy-destroy.

Accessing stored data over SRM

The Storage Resource Manager (SRM) is a standard interface for accessing data that are stored on the grid. Individual files and directories are identified in a similar fashion to web pages, by a URL. The SRM server does not actually contain the data, but acts as a redirector that instructs the grid data transport layers where any particular item of data is stored (can be multiple places) and what the preferred protocols are for accessing it at each location. For example, imagine that the results of a grid simulation have been made available in a SRM folder called dc1.1-12-2012 that is visible at server grinch.phys.uconn.edu. If one knows the name of a data file, it can be listed with its size using a command like the following.

srmls srm://grinch.phys.uconn.edu/Gluex/dc1.1-12-2012/dana_rest_1000596.hddm

This folder could contain several hundred thousand files, so do not attempt to list the entire directory using srmls. Other grid tools exist for browsing the file catalog. Having confirmed that the above file dana_rest_1000596.hddm exists on the srm at the above location, one can then fetch a copy of it using a command like the following.

srmcp srm://grinch.phys.uconn.edu/Gluex/dc1.1-12-2012/dana_rest_1000596.hddm \
      file:///`pwd`/dana_rest_1000596.hddm