GlueX Data Challenge Meeting, July 16, 2012

From GlueXWiki
Jump to: navigation, search

Meeting Time and Place

The meeting will be on Monday July 16, 2012 at 1:30pm EDT. For those people at Jefferson Lab, the meeting will be in room F326.

Meeting Connections

To connect from the outside:

1.) ESNET:

   Call ESNET Number 8542553 (this is the preferred connection method).

2.) Phone: (should not be needed)

  +1-866-740-1260 : US and Canada
  +1-303-248-0285 : International
    then use participant code: 3421244# (the # is needed when using the phone)
  or www.readytalk.com
   then type access code 3421244 into "join a meeting" (you need java plugin)

3.) EVO:

   A conference has been booked under "GlueX" from 1:00pm until 3:30pm (EST).
  • To phone into an EVO meeting, from the U.S. call (626) 395-2112 and then enter the EVO meeting code, 13 9993. Instructions for the Phone Bridge to EVO.
  • Skype Bridge to EVO

Agenda

  1. announcements
    1. Computer Center has been warned: jobs, tapes
  2. discussion of agenda for this meeting
    1. Mark's list of topics
  3. review of minutes from previous offline meeting
  4. scope of this challenge
    1. document of scope: Curtis's document, with on-going revision
    2. time-line
    3. grid vs. JLab
  5. mini-data challenges?
    1. appropriate tools: WMS? Others?
    2. example of a minimal framework for job management: Mark
  6. analysis system design plan
    1. ideas for analysis system: Paul M.
  7. future challenges
  8. organization
    1. meeting time confirmation (or change)
    2. subsequent meetings open to all interested collaborators?
  9. finalization of REST format: record size and performance numbers
  10. gridmake database entry

Attendance

Minutes

Present:

  • CMU: Paul Mattione, Curtis Meyer
  • IU: Matt Shepherd
  • JLab: Eugene Chudakov, Mark Ito, David Lawrence
  • UConn: Richard Jones

Announcements

Mark reported that the Computer Center was given a heads-up that we will start working on the data challenge. Batch jobs will start to appear on the JLab farm. We will have a tape volume set that can be recycled.

Scope of this Challenge

  • We agreed to continue on the course of using Curtis's document as a repository of our ideas about the data challenge (DC). Curtis agreed to convert document 2031 into a Wiki page. The page is now available as Planning for The Next GlueX Data Challenge.
  • Curtis reminded us that we need a robust storage resource manager (SRM) for the DC.
  • Grid and JLab: Mark asked about whether we want to pursue large-scale production on the Grid, at JLab, or both. We decided to pursue both.
  • Matt thought that a huge sample of Pythia data would be sufficient to address main goals of the DC. Physics signals are of a smaller scale and can be generated in a more ad hoc mannger.
  • We talked about December or January as a tentative time frame for doing the first DC. In the future we will have to set-up more formal milestones.

Mini-Data Challenges

It looks like the work management system packages that were recommended for us may not be appropriate for tracking jobs at JLab. They are oriented toward a grid-based environment.

Mark described a perl script, jproj.pl, he wrote to manage jobs on the JLab farm for the PrimEx experiment. It uses a MySQL database to keep track of jobs and output files and handles multiple job submission. It is driven off a list of input files to be processed. Multiple, simultaneous projects are supported. Some assumptions about filenames, and processing conventions are made to simplify the script. He will use a modified version to get started processing multiple jobs at JLab.

Richard reminded us that his gridmake system offers similar functionality. Mark agreed to look at it as a possible, more sophisticated replacement for jproj.pl.

At the last offline meeting Mark described the idea of doing multiple, scheduled, medium-scale data challenges as a development environment for the tools do a large-scale DC. The idea is to expand scope as we go from mini-DC to mini-DC, testing ideas as we go. There was a consensus around following this approach, at least initially.

Analysis System Design Plan

Paul presented some classes for automating the selection of particle combinations and doing kinematic fits on those combinations by specifying the reaction to be studied when construction the class. See his wiki page for details. He has other related ideas which he will be developing in the near future.

Matt proposed that we start doing analysis with simple, individually developed scripts and incrementally develop system(s) based on that experience.

Finalization of REST Format: record size and performance numbers

Richard discussed the changes he made to REST format based on comments from the last offline meeting and subsequent conversations with Paul. He made the switch from using DChargedTrackHypothesis to DTrackTimeBased. He sees a performance hit when the change is made due to the need to swim a trajectory. The reconstitution rate is 30 Hz for the events he studied. This compares with a reconstruction rate (from raw hits) of 1.8 Hz. We agreed that the flexibility in analysis with new scheme was worth the extra processing time.

Action Items

  1. Make a list of milestones.
  2. Do a micro-data challenge with jproj.pl. -> Mark
  3. Check in the final REST format. -> Richard