Difference between revisions of "Notes on the Data Challenge"

From GlueXWiki
Jump to: navigation, search
(Analysis System)
m (Text replacement - "http://argus.phys.uregina.ca/cgi-bin/private" to "https://halldweb.jlab.org/doc-private")
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
 
=Ideas=
 
=Ideas=
  
* Curtis's note
+
* [https://halldweb.jlab.org/doc-private/DocDB/ShowDocument?docid=2031 Curtis's note]
 
** Comprehensive
 
** Comprehensive
 
** See note
 
** See note
* Mark's bullet points
+
* [http://markito3.wordpress.com/2012/06/30/ideas-for-a-data-challenge/ Mark's blog]
 
** Develop system for submitting and tracking large-volume simulation and reconstruction jobs
 
** Develop system for submitting and tracking large-volume simulation and reconstruction jobs
* Matt's email
+
* [https://mailman.jlab.org/pipermail/halld-offline/2012-July/000987.html Matt's email]
 
** Include data analysis system
 
** Include data analysis system
 
** Develop/test system for delivering reconstruction data to data analyzers
 
** Develop/test system for delivering reconstruction data to data analyzers
* David
+
* [https://mailman.jlab.org/pipermail/halld-offline/2012-July/000994.html David's email]
 
** two data challenges: simulation and reconstruction
 
** two data challenges: simulation and reconstruction
 
*** simulation: run the MC, reconstruct it, produce reconstructed data
 
*** simulation: run the MC, reconstruct it, produce reconstructed data

Latest revision as of 17:13, 24 February 2017

Ideas

  • Curtis's note
    • Comprehensive
    • See note
  • Mark's blog
    • Develop system for submitting and tracking large-volume simulation and reconstruction jobs
  • Matt's email
    • Include data analysis system
    • Develop/test system for delivering reconstruction data to data analyzers
  • David's email
    • two data challenges: simulation and reconstruction
      • simulation: run the MC, reconstruct it, produce reconstructed data
      • reconstruction: create fake raw data sample, reconstruct it
    • shipping reconstructed data to two institutions
    • other specific proposals

Tools

  • EventStore
  • PanDA
    • received offer of help from Torre Wenaus

Intermediate Goal: Mini Data Challenges (reconstruction-type)

  • one major problem to be solved is how to scale:
    • how to generate and run thousands of jobs
    • assess their status (before, during, and after they run)
    • manage all output files and diagnostic data
    • same issues for simulation and reconstruction: want a common framework
  • with this in place we can iterate in mini-data challenges
    • wrong data mix?: change it
    • wrong output format?: change it
    • wrong photon reconstruction algorithm? change it
  • we want to be a position where re-running a mini-challenge is not big deal
  • in parallel develop everything else
    • code correctness
    • execution speed
    • design and implement analysis system
    • raw data generation
    • planning for test bed for full data challenge
    • reconstructed data format
  • find bottle-necks at intermediate scale
  • say by September 1 and every two weeks after that

Analysis System

  • another major problem: we don't have one
  • more of a design and development effort
  • can be fed by data from mini-challenges
    • event format
    • storage requirements/configuration
    • data discovery
    • user tools

References

  • PanDA proposal for system development for non-Atlas projects
  • ATLAS data challenge note
  • EventStore