Difference between revisions of "Spring 2016 Tape Notes"

From GlueXWiki
Jump to: navigation, search
 
Line 3: Line 3:
 
We are seeing the following behavior from the first two calibration runs over "production" data:
 
We are seeing the following behavior from the first two calibration runs over "production" data:
 
- If we cache all files from tape beforehand using jcache, we get reasonable performance.  We have to be sure to only cache enough data that will fit on the disk
 
- If we cache all files from tape beforehand using jcache, we get reasonable performance.  We have to be sure to only cache enough data that will fit on the disk
- If we only cache the first file from each run, then processing is much slower, and we get hit by a "long tail" of jobs which seem to take forever to load the files from tape (multiple days).
+
- If we only cache the first file from each run, then processing is slower, and we can see a "long tail" of jobs.
  
 
Note that we are also running monitoring jobs over the data as it comes in.
 
Note that we are also running monitoring jobs over the data as it comes in.

Latest revision as of 18:11, 9 March 2016

2016/3/3

We are seeing the following behavior from the first two calibration runs over "production" data: - If we cache all files from tape beforehand using jcache, we get reasonable performance. We have to be sure to only cache enough data that will fit on the disk - If we only cache the first file from each run, then processing is slower, and we can see a "long tail" of jobs.

Note that we are also running monitoring jobs over the data as it comes in.


Managing our cache disk usage ourselves during production running will not be practical, since we will be running three separate workflows from three accounts simultaneously:

  1. Monitoring of data as it hits the tape
  2. Calibration of data
  3. Initial reconstruction of full runs