Difference between revisions of "Notes on the Data Challenge"
From GlueXWiki
(→Analysis System) |
|||
Line 1: | Line 1: | ||
=Ideas= | =Ideas= | ||
− | * Curtis's note | + | * [http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2031 Curtis's note] |
** Comprehensive | ** Comprehensive | ||
** See note | ** See note | ||
− | * Mark's | + | * [http://markito3.wordpress.com/2012/06/30/ideas-for-a-data-challenge/ Mark's blog] |
** Develop system for submitting and tracking large-volume simulation and reconstruction jobs | ** Develop system for submitting and tracking large-volume simulation and reconstruction jobs | ||
− | * Matt's email | + | * [https://mailman.jlab.org/pipermail/halld-offline/2012-July/000987.html Matt's email] |
** Include data analysis system | ** Include data analysis system | ||
** Develop/test system for delivering reconstruction data to data analyzers | ** Develop/test system for delivering reconstruction data to data analyzers | ||
− | * David | + | * [https://mailman.jlab.org/pipermail/halld-offline/2012-July/000994.html David's email] |
** two data challenges: simulation and reconstruction | ** two data challenges: simulation and reconstruction | ||
*** simulation: run the MC, reconstruct it, produce reconstructed data | *** simulation: run the MC, reconstruct it, produce reconstructed data |
Revision as of 13:14, 11 July 2012
Contents
Ideas
- Curtis's note
- Comprehensive
- See note
- Mark's blog
- Develop system for submitting and tracking large-volume simulation and reconstruction jobs
- Matt's email
- Include data analysis system
- Develop/test system for delivering reconstruction data to data analyzers
- David's email
- two data challenges: simulation and reconstruction
- simulation: run the MC, reconstruct it, produce reconstructed data
- reconstruction: create fake raw data sample, reconstruct it
- shipping reconstructed data to two institutions
- other specific proposals
- two data challenges: simulation and reconstruction
Tools
- EventStore
- PanDA
- received offer of help from Torre Wenaus
Intermediate Goal: Mini Data Challenges (reconstruction-type)
- one major problem to be solved is how to scale:
- how to generate and run thousands of jobs
- assess their status (before, during, and after they run)
- manage all output files and diagnostic data
- same issues for simulation and reconstruction: want a common framework
- with this in place we can iterate in mini-data challenges
- wrong data mix?: change it
- wrong output format?: change it
- wrong photon reconstruction algorithm? change it
- we want to be a position where re-running a mini-challenge is not big deal
- in parallel develop everything else
- code correctness
- execution speed
- design and implement analysis system
- raw data generation
- planning for test bed for full data challenge
- reconstructed data format
- find bottle-necks at intermediate scale
- say by September 1 and every two weeks after that
Analysis System
- another major problem: we don't have one
- more of a design and development effort
- can be fed by data from mini-challenges
- event format
- storage requirements/configuration
- data discovery
- user tools
References
- PanDA proposal for system development for non-Atlas projects
- ATLAS data challenge note
- EventStore