Difference between revisions of "Offline Monitoring Incoming Data"

From GlueXWiki
Jump to: navigation, search
(Launching for a new run period)
m (Launching for a new run period)
 
(22 intermediate revisions by 2 users not shown)
Line 19: Line 19:
 
== Preparing the software ==
 
== Preparing the software ==
  
1. Update the environment, using the latest desired versions of JANA, the CCDB, etc.  Also, the launch software will create new tags of the HDDS and sim-recon repositories, so update the version*.xml file referenced in the environment file to use the soon-to-be-created tags.  This must be done <b>BEFORE</b> launch project creation. The environment file is at:
+
* Do the exact same steps as detailed for the [[Offline_Monitoring_Archived_Data | offline monitoring and reconstruction setup]] EXCEPT the following.  
<pre>~/env_monitoring_incoming</pre>
+
  
2. Setup the environment.  This will override the HDDS and sim-recon in the version*.xml file and will instead use the monitoring launch working-area builds. Call: <pre>source ~/env_monitoring_incoming</pre>
+
'''1)''' Replace <span style="color:red">"monitoring_launch"</span> with <span style="color:red">"monitoring_incoming."</span>
  
3. Updating & building hdds:  
+
'''2)''' The software should be built with a different directory name (e.g. <span style="color:red">"build1"</span>) instead of <span style="color:red">"monitoring_incoming."</span>  And then a soft link should be created:
 
<pre>
 
<pre>
cd $HDDS_HOME
+
ln -s build1 monitoring_incoming
git pull                # Get latest software
+
scons -c install        # Clean out the old install: EXTREMELY IMPORTANT for cleaning out stale headers
+
scons install -j4      # Rebuild and re-install with 4 threads
+
 
</pre>
 
</pre>
 +
This way, if the software needs to be updated in the middle of the run, you just create a new build in parallel (e.g. <span style="color:red">"build2"</span>) and then switch the symbolic links when you're ready.
  
4. Updating & building sim-recon:
+
'''3)''' Don't create a CCDB sqlite file. These will be created uniquely for each job, so that each job has the most up-to-date calibration constants.
<pre>
+
cd $HALLD_HOME/src
+
git pull                # Get latest software
+
scons -c install        # Clean out the old install: EXTREMELY IMPORTANT for cleaning out stale headers
+
scons install -j4      # Rebuild and re-install with 4 threads
+
</pre>
+
  
5. Create a new sqlite file containing the very latest calibration constants. Original documentation on creating sqlite files are [https://halldweb.jlab.org/wiki/index.php/SQLite-form_of_the_CCDB_database here].
+
== Starting A New Run Period ==
<pre>
+
 
cd $GLUEX_MYTOP/../sqlite/
+
* Do the exact same steps as detailed in "Starting a new run period" at [[Offline_Monitoring_Archived_Data|Link]]
$CCDB_HOME/scripts/mysql2sqlite/mysql2sqlite.sh -hhallddb.jlab.org -uccdb_user ccdb | sqlite3 ccdb.sqlite
+
mv ccdb.sqlite ccdb_monitoring_incoming.sqlite #replacing the old file
+
</pre>
+
  
 
== Launching for a new run period ==
 
== Launching for a new run period ==
  
1) Download the "monitoring" scripts directory from svn. For the gxprojN accounts, use the directory ~/monitoring/:
+
'''1)''' Download the "monitoring" scripts directory from svn. For the gxprojN accounts, use the directory ~/monitoring/:
 
<pre>
 
<pre>
 
cd ~/
 
cd ~/
Line 56: Line 44:
 
</pre>
 
</pre>
  
2) Edit the job config file, ~/monitoring/incoming/input.config, which is used to register jobs in hdswif. The version # should be "01."  A typical config file will look this:
+
'''2)''' Update the '''<span style="color:red">jobs_incoming.config</span>''' job config file. Definitely be sure to update '''<span style="color:red">RUNPERIOD</span>'''. Monitoring of the incoming data should always be '''<span style="color:red">ver01</span>'''.
PROJECT                      gluex
+
<pre>
TRACK                        reconstruction
+
vi ~/monitoring/incoming/jobs_incoming.config
OS                            centos65
+
</pre>
NCORES                        24        # 24 = Entire node
+
DISK                          40
+
RAM                          32        # 32 GB per node
+
TIMELIMIT                      8
+
JOBNAMEBASE                  offmon
+
RUNPERIOD                    2016-02
+
VERSION                      01
+
OUTPUT_TOPDIR                /cache/halld/offline_monitoring/RunPeriod-[RUNPERIOD]/ver[VERSION] # Example of other variables included in variable
+
SCRIPTFILE                    /home/gxproj1/monitoring/incoming/script.sh                        # Must specify full path
+
ENVFILE                      /home/gxproj1/env_monitoring_incoming                              # Must specify full path
+
PLUGINS                      TAGH_online,TAGM_online,BCAL_online,CDC_online,CDC_expert,FCAL_online,FDC_online,ST_online_lowlevel,ST_online_tracking,TOF_online,PS_online,PSC_online,PSPair_online,TPOL_online,TOF_TDC_shift,monitoring_hists,danarest,BCAL_Eff,p2pi_hists,p3pi_hists,HLDetectorTiming,BCAL_inv_mass,trackeff_missing,TRIG_online,CDC_drift,RF_online,BCAL_attenlength_gainratio,BCAL_TDC_Timing
+
  
3) Create a new swif workflow for running all of the incoming data (e.g. <workflow> = offline_monitoring_RunPeriod2016_02_ver01_hd_rawdata):
+
'''3)''' Update the '''<span style="color:red">jana_incoming.config</span>''' jana config file.  This contains the command line arguments given to JANA. Definitely be sure to update '''<span style="color:red">REST:DATAVERSIONSTRING</span>'''.
 
<pre>
 
<pre>
~/monitoring/hdswif/hdswif.py create [workflow] -c ~/monitoring/incoming/input.config
+
vi ~/monitoring/incoming/jana_incoming.config
 
</pre>
 
</pre>
  
4) Modify the cron job to run for this run-period. Just change the command-line argument for the run-period to the new one. E.g. (2016-02):
+
'''4)''' Create the SWIF workflow.  The workflow should have a name like '''<span style="color:red">"offmon_2016-10_ver01"</span>.''' It should also match the workflow name in the job config file (e.g. jobs_incoming.config).
 
<pre>
 
<pre>
*/10      *      *      *        *      ~/monitoring/incoming/process_incoming.py 2016-02 ~/monitoring/incoming/input.config 20
+
swif create -workflow <my_workflow>
 
</pre>
 
</pre>
  
5) Before launching the cron job, manually run the script first.  This is just in case there are already a lot of files on disk, and it takes longer than 10 minutes to run the first execution. In this case, jobs may be double-submitted! So, first execute the python script manually (where below, 2016-02 is the run period):
+
'''5)''' In ~/monitoring/incoming/cron_exec.csh, modify the script to run for the new run period  E.g. for 2016-02:
 
<pre>
 
<pre>
python ~/monitoring/incoming/process_incoming.py 2016-02 ~/monitoring/incoming/input.config 20 >& ~/incoming_log.txt
+
~/monitoring/incoming/cron_exec.csh
 
</pre>
 
</pre>
  
6) Now that the initial batch of jobs have been submitted, launch the cron job by running:
+
'''6)''' Before launching the cron job, manually run the script first.  This is just in case there are already a lot of files on disk, and it takes longer than 15 minutes to run the first execution. In this case, jobs may be double-submitted! So, first execute the python script manually (this submits jobs for the first 5 files (000 -> 004) of every run that are on /mss/, but haven't been submitted yet):
 +
<pre>
 +
python ~/monitoring/incoming/submit_jobs.py 2016-10 ~/monitoring/incoming/jobs_incoming.config 5 >& ~/incoming_log.txt
 +
</pre>
 +
 
 +
'''7)''' Update the script for post-processing for the new run period:
 +
<pre>
 +
~/monitoring/process/check_monitoring_data.csh
 +
</pre>
 +
 
 +
'''8)''' Add the incoming data to the data version database
 +
<pre>
 +
~/monitoring/process/register_new_version.py add ~/monitoring/process/version/incoming_2016-10_ver01
 +
</pre>
 +
 
 +
'''9)''' Check if the cron demon is running on that node:
 +
<pre>
 +
ps aux | grep crond
 +
</pre>
 +
 
 +
'''10)''' Now that the initial batch of jobs have been submitted, launch the cron job by running:
 
<pre>
 
<pre>
 
crontab cron_incoming
 
crontab cron_incoming
 
</pre>
 
</pre>
  
7) To check whether the cron job is running, do
+
'''11)''' To check whether the cron job is running (on the same machine you launched the cron job, i.e. for CentOS7: ifarm1401 or ifarm1402), do
 
<pre>
 
<pre>
 
crontab -l
 
crontab -l
 
</pre>
 
</pre>
  
8) The stdout & stderr from the cronjob are piped to a log file located at:
+
'''12)''' The stdout & stderr from the cronjob are piped to a log file located at:
 +
<pre>
 +
~/incoming.log
 +
</pre>
 +
and
 
<pre>
 
<pre>
~/incoming_log.txt
+
~/check.log
 
</pre>
 
</pre>
  
9) Periodically check how the jobs are doing, and modify and resubmit failed jobs as needed (where <problem> can be one of '''SYSTEM, TIMEOUT, RLIMIT'''):
+
'''13)''' Periodically check how the jobs are doing, and modify and resubmit failed jobs as needed (where <problem> can be one of '''SYSTEM, TIMEOUT, RLIMIT'''):
 
<pre>
 
<pre>
 
swif status <workflow>
 
swif status <workflow>
Line 108: Line 109:
 
</pre>
 
</pre>
  
10) To remove the cron job (e.g. at the end of the run) do
+
'''14)''' To remove the cron job (e.g. at the end of the run) do
 
<pre>
 
<pre>
 
crontab -r
 
crontab -r
 
</pre>
 
</pre>

Latest revision as of 16:19, 29 October 2019

Saving Online Monitoring Data

The procedure for writing the data out is given in, e.g., Raid-to-Silo Transfer Strategy.

Once the DAQ writes out the data to the raid disk, cron jobs will copy the file to tape, and within ~20 min., we will have access to the file on tape at /mss/halld/$RUN_PERIOD/rawdata/RunXXXXXX.

All online monitoring plugins will be run as data is taken. They will be accessible within the counting house via RootSpy, and for each run and file, a ROOT file containing the histograms will be saved within a subdirectory for each run.

For immediate access to these files, the raid disk files may be accessed directly from the counting house, or the tape files will be available within ~20 min. of the file being written out.

Preparing the software

1) Replace "monitoring_launch" with "monitoring_incoming."

2) The software should be built with a different directory name (e.g. "build1") instead of "monitoring_incoming." And then a soft link should be created:

ln -s build1 monitoring_incoming

This way, if the software needs to be updated in the middle of the run, you just create a new build in parallel (e.g. "build2") and then switch the symbolic links when you're ready.

3) Don't create a CCDB sqlite file. These will be created uniquely for each job, so that each job has the most up-to-date calibration constants.

Starting A New Run Period

  • Do the exact same steps as detailed in "Starting a new run period" at Link

Launching for a new run period

1) Download the "monitoring" scripts directory from svn. For the gxprojN accounts, use the directory ~/monitoring/:

cd ~/
svn co https://halldsvn.jlab.org/repos/trunk/scripts/monitoring/
cd monitoring/incoming

2) Update the jobs_incoming.config job config file. Definitely be sure to update RUNPERIOD. Monitoring of the incoming data should always be ver01.

vi ~/monitoring/incoming/jobs_incoming.config

3) Update the jana_incoming.config jana config file. This contains the command line arguments given to JANA. Definitely be sure to update REST:DATAVERSIONSTRING.

vi ~/monitoring/incoming/jana_incoming.config

4) Create the SWIF workflow. The workflow should have a name like "offmon_2016-10_ver01". It should also match the workflow name in the job config file (e.g. jobs_incoming.config).

swif create -workflow <my_workflow>

5) In ~/monitoring/incoming/cron_exec.csh, modify the script to run for the new run period E.g. for 2016-02:

~/monitoring/incoming/cron_exec.csh

6) Before launching the cron job, manually run the script first. This is just in case there are already a lot of files on disk, and it takes longer than 15 minutes to run the first execution. In this case, jobs may be double-submitted! So, first execute the python script manually (this submits jobs for the first 5 files (000 -> 004) of every run that are on /mss/, but haven't been submitted yet):

python ~/monitoring/incoming/submit_jobs.py 2016-10 ~/monitoring/incoming/jobs_incoming.config 5 >& ~/incoming_log.txt

7) Update the script for post-processing for the new run period:

~/monitoring/process/check_monitoring_data.csh

8) Add the incoming data to the data version database

~/monitoring/process/register_new_version.py add ~/monitoring/process/version/incoming_2016-10_ver01

9) Check if the cron demon is running on that node:

ps aux | grep crond

10) Now that the initial batch of jobs have been submitted, launch the cron job by running:

crontab cron_incoming

11) To check whether the cron job is running (on the same machine you launched the cron job, i.e. for CentOS7: ifarm1401 or ifarm1402), do

crontab -l

12) The stdout & stderr from the cronjob are piped to a log file located at:

~/incoming.log

and

~/check.log

13) Periodically check how the jobs are doing, and modify and resubmit failed jobs as needed (where <problem> can be one of SYSTEM, TIMEOUT, RLIMIT):

swif status <workflow>
~/monitoring/hdswif/hdswif.py resubmit <workflow> <problem>

14) To remove the cron job (e.g. at the end of the run) do

crontab -r