Offline Monitoring Incoming Data

From GlueXWiki
Revision as of 11:26, 14 February 2016 by Pmatt (Talk | contribs) (Launching for a new run period)

Jump to: navigation, search

Saving Online Monitoring Data

The procedure for writing the data out is given in, e.g., Raid-to-Silo Transfer Strategy.

Once the DAQ writes out the data to the raid disk, cron jobs will copy the file to tape, and within ~20 min., we will have access to the file on tape at /mss/halld/$RUN_PERIOD/rawdata/RunXXXXXX.

All online monitoring plugins will be run as data is taken. They will be accessible within the counting house via RootSpy, and for each run and file, a ROOT file containing the histograms will be saved within a subdirectory for each run.

For immediate access to these files, the raid disk files may be accessed directly from the counting house, or the tape files will be available within ~20 min. of the file being written out.

Preparing the software

1. Update the environment, using the latest desired versions of JANA, the CCDB, etc. Also, the launch software will create new tags of the HDDS and sim-recon repositories, so update the version*.xml file referenced in the environment file to use the soon-to-be-created tags. This must be done BEFORE launch project creation. The environment file is at:

~/env_monitoring_incoming
2. Setup the environment. This will override the HDDS and sim-recon in the version*.xml file and will instead use the monitoring launch working-area builds. Call:
source ~/env_monitoring_incoming

3. Updating & building hdds:

cd $HDDS_HOME
git pull                # Get latest software
scons -c install        # Clean out the old install: EXTREMELY IMPORTANT for cleaning out stale headers
scons install -j4       # Rebuild and re-install with 4 threads

4. Updating & building sim-recon:

cd $HALLD_HOME/src
git pull                # Get latest software
scons -c install        # Clean out the old install: EXTREMELY IMPORTANT for cleaning out stale headers
scons install -j4       # Rebuild and re-install with 4 threads

5. Create a new sqlite file containing the very latest calibration constants. Original documentation on creating sqlite files are here.

cd $GLUEX_MYTOP/../sqlite/
$CCDB_HOME/scripts/mysql2sqlite/mysql2sqlite.sh -hhallddb.jlab.org -uccdb_user ccdb | sqlite3 ccdb.sqlite
mv ccdb.sqlite ccdb_monitoring_incoming.sqlite #replacing the old file

Launching for a new run period

1) Download the "monitoring" scripts directory from svn. For the gxprojN accounts, use the directory ~/monitoring/:

cd ~/
svn co https://halldsvn.jlab.org/repos/trunk/scripts/monitoring/
cd monitoring/incoming
chmod 755 script.sh    #Fix the permissions!

2) Edit the job config file, ~/monitoring/incoming/input.config, which is used to register jobs in hdswif. The version # should be "01." A typical config file will look this:

PROJECT                       gluex
TRACK                         reconstruction
OS                            centos65
NCORES                        24        # 24 = Entire node
DISK                          40
RAM                           32        # 32 GB per node
TIMELIMIT                      8
JOBNAMEBASE                   offmon
RUNPERIOD                     2016-02
VERSION                       01
OUTPUT_TOPDIR                 /cache/halld/offline_monitoring/RunPeriod-[RUNPERIOD]/ver[VERSION] # Example of other variables included in variable
SCRIPTFILE                    /home/gxproj1/monitoring/incoming/script.sh                        # Must specify full path
ENVFILE                       /home/gxproj1/env_monitoring_incoming                              # Must specify full path
PLUGINS                       TAGH_online,TAGM_online,BCAL_online,CDC_online,CDC_expert,FCAL_online,FDC_online,ST_online_lowlevel,ST_online_tracking,TOF_online,PS_online,PSC_online,PSPair_online,TPOL_online,TOF_TDC_shift,monitoring_hists,danarest,BCAL_Eff,p2pi_hists,p3pi_hists,HLDetectorTiming,BCAL_inv_mass,trackeff_missing,TRIG_online,CDC_drift,RF_online,BCAL_attenlength_gainratio,BCAL_TDC_Timing

3) Create a new swif workflow for running all of the incoming data (e.g. <workflow> = offline_monitoring_RunPeriod2016_02_ver01_hd_rawdata):

~/monitoring/hdswif/hdswif.py create [workflow] -c ~/monitoring/incoming/input.config

4) In ~/monitoring/incoming/cron_exec.csh, modify the script to run for the new run period E.g. for 2016-02:

python ~/monitoring/incoming/process_incoming.py 2016-02 ~/monitoring/incoming/input.config 20 >& ~/incoming_log.txt

5) Before launching the cron job, manually run the script first. This is just in case there are already a lot of files on disk, and it takes longer than 15 minutes to run the first execution. In this case, jobs may be double-submitted! So, first execute the python script manually:

~/monitoring/incoming/cron_exec.csh

6) Now that the initial batch of jobs have been submitted, launch the cron job by running:

crontab cron_incoming

7) To check whether the cron job is running, do

crontab -l

8) The stdout & stderr from the cronjob are piped to a log file located at:

~/incoming_log.txt

9) Periodically check how the jobs are doing, and modify and resubmit failed jobs as needed (where <problem> can be one of SYSTEM, TIMEOUT, RLIMIT):

swif status <workflow>
~/monitoring/hdswif/hdswif.py resubmit <workflow> <problem>

10) To remove the cron job (e.g. at the end of the run) do

crontab -r