GlueX Data Challenge Meeting, March 28, 2014

From GlueXWiki
Jump to: navigation, search

Meeting Information

GlueX Data Challenge Meeting
Friday, March 28, 2014
11:00 am, EDT
JLab: CEBAF Center, F326

Connection Using Bluejeans

  1. To join via Polycom room system go to the IP Address: ( and enter the meeting ID: 531811405.
  2. To join via a Web Browser, go to the page [1]
  3. To join via phone, use one of the following numbers and the Conference ID: 531811405
    • US or Canada: +1 408 740 7256 or
    • US or Canada: +1 888 240 2560
  4. More information on connecting to bluejeans is available.


  1. Announcements
  2. Status reports from sites
  3. Discussion on moving forward.
    • Data Storage Location(s)
    • Reduction in the number of files?
    • Skimming files for physics?
  4. Weekly data challenges.
  5. Running a data challenge from the mass storage system.
  6. AOT



  • CMU: Paul Mattione, Curtis Meyer
  • FSU: Volker Crede, Aristeidis Tsaris
  • JLab: Mark Ito (chair), Sandy Philpott, Simon Taylor
  • IU: Kei Moriya
  • MIT: Justin Stevens
  • NU: Sean Dobbs, Kam Seth


Mark re-capped the story with the optional versions of JANA and HDDS that were discussed on the email list early in the week. They are listed on the conditions page.

Status reports from sites


Mark gave the report. About 1000 cores were added on Monday. Our share now is roughly 1400 cores.

Sandy asked if we could use more cores and the answer was yes. She also mentioned that the power outage in the Data Center has been pushed back to late April and thus will not interfere with the challenge.

Mark discussed recent progress with job tracking at JLab. Chris Hewitt showed him how to access the Auger database via a web service and now that information can be downloaded to a GlueX-accessible database. Mark showed some example tables. There are a lot of diagnostic statistics that can be derived from this information.


Sean reported that many of the OSG sites have a much leaner mix of installed packages than anticipated and that that has slowed progress. The CERNVM disk export is working and ready to go.


Justin gave the report.

The FutureGrid component is up to 180 cores now, and more may be coming.


Paul gave the report. 384 cores have been reserved for three weeks.


Aristeidis gave the report. FSU is running on 144 cores.


Curtis did a quick tally:

run EM bkgd. (γ/s in coh. peak) events (millions)
9001 1e7 350
9002 5e7 100
9003 0 250

Discussion on moving forward

Curtis lead the discussion on several topics.

Data Storage Location(s)

Curtis remarked that now that we are producing this data, we need to decide where to store it so that it can be used effectively. Sean reminded us that our current plan to to store them at UConn and Northwestern with access via the OSG SRM. This plan is acceptable to all collaborating institutions. Unfortunately, JLab cannot participate in this activity, for both contributing to the data set and for access to it, given our current knowledge base which no doubt needs expansion.

Reduction in the number of files?

We discussed whether we should try to merge job output to reduce the file-management load. One practical issue is that we do not have unique event numbers for a given run. Unique file-number/event-number combinations, yes. There was no consensus on the need to combine. We may take it up at a later date.

Skimming files for physics?

Curtis proposed a skim that would be useful for a lot of people. Paul noted the difficulty in balancing general usefulness versus worthwhile reduction in data volume. Justin noted that the speed-up due to not re-swimming tracks from the REST output changes the calculation. Kei noted that we have not settled on what might be most useful. Paul told us that the functionality in the code to do the skims is already there. Mark remarked that we might want to wait until a demand arises before making a decision. Paul mentioned that the EventStore solution might also be driven by such a demand. Again, we need to discuss this further.

Future Work, Near-Term and Far

  • Weekly data challenges.
  • Running a data challenge from the mass storage system at JLab.
  • Reduction of processing devoted to electromagnetic background generation.
  • Monitoring quality of the current data challenge.
  • File transfers in and out of JLab.


Justin asked about the time-scale for this data challenge. Another week is clearly necessary. Curtis pointed out that until the OSG processing rate is known, it is hard to make a call on this. Recall that last time (i. e., for DC1) the OSG contributed about 80% of the total processing.