Difference between revisions of "Offline Monitoring Incoming Data"

From GlueXWiki
Jump to: navigation, search
(Offline: Setup for a new run period)
m (Launching for a new run period)
 
(34 intermediate revisions by 2 users not shown)
Line 17: Line 17:
 
file being written out.
 
file being written out.
  
== Offline: Setup for a new run period ==
+
== Preparing the software ==
  
# Download the "monitoring" scripts directory from svn. For the gxprojN accounts, use the directory ~/monitoring/: <pre>
+
* Do the exact same steps as detailed for the [[Offline_Monitoring_Archived_Data | offline monitoring and reconstruction setup]] EXCEPT the following.  
cd ~/
+
svn co https://halldsvn.jlab.org/repos/trunk/scripts/monitoring/
+
cd monitoring/incoming </pre>
+
  
# Edit the job config file, ~/monitoring/incoming/input.config, which is used to register jobs in hdswif. A typical config file will look this:
+
'''1)''' Replace <span style="color:red">"monitoring_launch"</span> with <span style="color:red">"monitoring_incoming."</span>
PROJECT                      gluex
+
TRACK                        reconstruction
+
OS                            centos65
+
NCORES                        24        # 24 = Entire node
+
DISK                          40
+
RAM                          32        # 32 GB per node
+
TIMELIMIT                      8
+
JOBNAMEBASE                  offmon
+
RUNPERIOD                    2016-02
+
VERSION                      01
+
OUTPUT_TOPDIR                /cache/halld/offline_monitoring/RunPeriod-[RUNPERIOD]/ver[VERSION] # Example of other variables included in variable
+
SCRIPTFILE                    /home/gxproj1/monitoring/incoming/script.sh                        # Must specify full path
+
ENVFILE                      /home/gxproj1/env_monitoring_incoming                              # Must specify full path
+
  
# Create a new swif workflow:
+
'''2)''' The software should be built with a different directory name (e.g. <span style="color:red">"build1"</span>) instead of <span style="color:red">"monitoring_incoming."</span>  And then a soft link should be created: 
 
<pre>
 
<pre>
~/monitoring/hdswif/hdswif.py create [workflow] -c ~/monitoring/incoming/input.config
+
ln -s build1 monitoring_incoming
 
</pre>
 
</pre>
* If not already running, the cron job can be launched by running:
+
This way, if the software needs to be updated in the middle of the run, you just create a new build in parallel (e.g. <span style="color:red">"build2"</span>) and then switch the symbolic links when you're ready.
 +
 
 +
'''3)''' Don't create a CCDB sqlite file.  These will be created uniquely for each job, so that each job has the most up-to-date calibration constants.
 +
 
 +
== Starting A New Run Period ==
 +
 
 +
* Do the exact same steps as detailed in "Starting a new run period" at [[Offline_Monitoring_Archived_Data|Link]]
 +
 
 +
== Launching for a new run period ==
 +
 
 +
'''1)''' Download the "monitoring" scripts directory from svn. For the gxprojN accounts, use the directory ~/monitoring/:
 
<pre>
 
<pre>
crontab cron_incoming
+
cd ~/
 +
svn co https://halldsvn.jlab.org/repos/trunk/scripts/monitoring/
 +
cd monitoring/incoming
 
</pre>
 
</pre>
  
<!--
+
'''2)''' Update the '''<span style="color:red">jobs_incoming.config</span>''' job config file.  Definitely be sure to update '''<span style="color:red">RUNPERIOD</span>'''.  Monitoring of the incoming data should always be '''<span style="color:red">ver01</span>'''.
=== Running the cron job ===
+
<pre>
 +
vi ~/monitoring/incoming/jobs_incoming.config
 +
</pre>
  
'''IMPORTANT:''' The cron job should not be running while you are manually submitting jobs using the jproj.pl script for the same project, or else you will probably multiply-submit a job.  
+
'''3)''' Update the '''<span style="color:red">jana_incoming.config</span>''' jana config file.  This contains the command line arguments given to JANA. Definitely be sure to update '''<span style="color:red">REST:DATAVERSIONSTRING</span>'''.
 +
<pre>
 +
vi ~/monitoring/incoming/jana_incoming.config
 +
</pre>
  
* Go to the cron job directory:  
+
'''4)''' Create the SWIF workflow.  The workflow should have a name like '''<span style="color:red">"offmon_2016-10_ver01"</span>.''' It should also match the workflow name in the job config file (e.g. jobs_incoming.config).
 
<pre>
 
<pre>
cd /u/home/gxproj1/halld/monitoring/newruns
+
swif create -workflow <my_workflow>
 
</pre>
 
</pre>
  
* The cron_plugins file is the cronjob that will be executed. During execution, it runs the exec.sh command in the same folder. This command takes two arguments: the project name, and the maximum file number for each run.  These fields should be updated in the cron_plugins file before running.  
+
'''5)''' In ~/monitoring/incoming/cron_exec.csh, modify the script to run for the new run period  E.g. for 2016-02:
 +
<pre>
 +
~/monitoring/incoming/cron_exec.csh
 +
</pre>
  
* The exec.sh command updates the job management database table with any data that has arrived on tape since it was last updated, ignoring file numbers greater than the maximum file number. It then submits jobs for these files.
+
'''6)''' Before launching the cron job, manually run the script first.  This is just in case there are already a lot of files on disk, and it takes longer than 15 minutes to run the first execution. In this case, jobs may be double-submitted! So, first execute the python script manually (this submits jobs for the first 5 files (000 -> 004) of every run that are on /mss/, but haven't been submitted yet):
 +
<pre>
 +
python ~/monitoring/incoming/submit_jobs.py 2016-10 ~/monitoring/incoming/jobs_incoming.config 5 >& ~/incoming_log.txt
 +
</pre>
  
* To start the cron job, run:
+
'''7)''' Update the script for post-processing for the new run period:
 
<pre>
 
<pre>
crontab cron_plugins
+
~/monitoring/process/check_monitoring_data.csh
 
</pre>
 
</pre>
  
* To check whether the cron job is running, do
+
'''8)''' Add the incoming data to the data version database
 +
<pre>
 +
~/monitoring/process/register_new_version.py add ~/monitoring/process/version/incoming_2016-10_ver01
 +
</pre>
 +
 
 +
'''9)''' Check if the cron demon is running on that node:
 +
<pre>
 +
ps aux | grep crond
 +
</pre>
 +
 
 +
'''10)''' Now that the initial batch of jobs have been submitted, launch the cron job by running:
 +
<pre>
 +
crontab cron_incoming
 +
</pre>
 +
 
 +
'''11)''' To check whether the cron job is running (on the same machine you launched the cron job, i.e. for CentOS7: ifarm1401 or ifarm1402), do
 
<pre>
 
<pre>
 
crontab -l
 
crontab -l
 
</pre>
 
</pre>
  
* To remove the cron job do
+
'''12)''' The stdout & stderr from the cronjob are piped to a log file located at:
 +
<pre>
 +
~/incoming.log
 +
</pre>
 +
and
 +
<pre>
 +
~/check.log
 +
</pre>
 +
 
 +
'''13)''' Periodically check how the jobs are doing, and modify and resubmit failed jobs as needed (where <problem> can be one of '''SYSTEM, TIMEOUT, RLIMIT'''):
 +
<pre>
 +
swif status <workflow>
 +
~/monitoring/hdswif/hdswif.py resubmit <workflow> <problem>
 +
</pre>
 +
 
 +
'''14)''' To remove the cron job (e.g. at the end of the run) do
 
<pre>
 
<pre>
 
crontab -r
 
crontab -r
 
</pre>
 
</pre>
-->
 

Latest revision as of 16:19, 29 October 2019

Saving Online Monitoring Data

The procedure for writing the data out is given in, e.g., Raid-to-Silo Transfer Strategy.

Once the DAQ writes out the data to the raid disk, cron jobs will copy the file to tape, and within ~20 min., we will have access to the file on tape at /mss/halld/$RUN_PERIOD/rawdata/RunXXXXXX.

All online monitoring plugins will be run as data is taken. They will be accessible within the counting house via RootSpy, and for each run and file, a ROOT file containing the histograms will be saved within a subdirectory for each run.

For immediate access to these files, the raid disk files may be accessed directly from the counting house, or the tape files will be available within ~20 min. of the file being written out.

Preparing the software

1) Replace "monitoring_launch" with "monitoring_incoming."

2) The software should be built with a different directory name (e.g. "build1") instead of "monitoring_incoming." And then a soft link should be created:

ln -s build1 monitoring_incoming

This way, if the software needs to be updated in the middle of the run, you just create a new build in parallel (e.g. "build2") and then switch the symbolic links when you're ready.

3) Don't create a CCDB sqlite file. These will be created uniquely for each job, so that each job has the most up-to-date calibration constants.

Starting A New Run Period

  • Do the exact same steps as detailed in "Starting a new run period" at Link

Launching for a new run period

1) Download the "monitoring" scripts directory from svn. For the gxprojN accounts, use the directory ~/monitoring/:

cd ~/
svn co https://halldsvn.jlab.org/repos/trunk/scripts/monitoring/
cd monitoring/incoming

2) Update the jobs_incoming.config job config file. Definitely be sure to update RUNPERIOD. Monitoring of the incoming data should always be ver01.

vi ~/monitoring/incoming/jobs_incoming.config

3) Update the jana_incoming.config jana config file. This contains the command line arguments given to JANA. Definitely be sure to update REST:DATAVERSIONSTRING.

vi ~/monitoring/incoming/jana_incoming.config

4) Create the SWIF workflow. The workflow should have a name like "offmon_2016-10_ver01". It should also match the workflow name in the job config file (e.g. jobs_incoming.config).

swif create -workflow <my_workflow>

5) In ~/monitoring/incoming/cron_exec.csh, modify the script to run for the new run period E.g. for 2016-02:

~/monitoring/incoming/cron_exec.csh

6) Before launching the cron job, manually run the script first. This is just in case there are already a lot of files on disk, and it takes longer than 15 minutes to run the first execution. In this case, jobs may be double-submitted! So, first execute the python script manually (this submits jobs for the first 5 files (000 -> 004) of every run that are on /mss/, but haven't been submitted yet):

python ~/monitoring/incoming/submit_jobs.py 2016-10 ~/monitoring/incoming/jobs_incoming.config 5 >& ~/incoming_log.txt

7) Update the script for post-processing for the new run period:

~/monitoring/process/check_monitoring_data.csh

8) Add the incoming data to the data version database

~/monitoring/process/register_new_version.py add ~/monitoring/process/version/incoming_2016-10_ver01

9) Check if the cron demon is running on that node:

ps aux | grep crond

10) Now that the initial batch of jobs have been submitted, launch the cron job by running:

crontab cron_incoming

11) To check whether the cron job is running (on the same machine you launched the cron job, i.e. for CentOS7: ifarm1401 or ifarm1402), do

crontab -l

12) The stdout & stderr from the cronjob are piped to a log file located at:

~/incoming.log

and

~/check.log

13) Periodically check how the jobs are doing, and modify and resubmit failed jobs as needed (where <problem> can be one of SYSTEM, TIMEOUT, RLIMIT):

swif status <workflow>
~/monitoring/hdswif/hdswif.py resubmit <workflow> <problem>

14) To remove the cron job (e.g. at the end of the run) do

crontab -r