Notes on the Data Challenge

From GlueXWiki
Jump to: navigation, search

Ideas

  • Curtis's note
    • Comprehensive
    • See note
  • Mark's blog
    • Develop system for submitting and tracking large-volume simulation and reconstruction jobs
  • Matt's email
    • Include data analysis system
    • Develop/test system for delivering reconstruction data to data analyzers
  • David's email
    • two data challenges: simulation and reconstruction
      • simulation: run the MC, reconstruct it, produce reconstructed data
      • reconstruction: create fake raw data sample, reconstruct it
    • shipping reconstructed data to two institutions
    • other specific proposals

Tools

  • EventStore
  • PanDA
    • received offer of help from Torre Wenaus

Intermediate Goal: Mini Data Challenges (reconstruction-type)

  • one major problem to be solved is how to scale:
    • how to generate and run thousands of jobs
    • assess their status (before, during, and after they run)
    • manage all output files and diagnostic data
    • same issues for simulation and reconstruction: want a common framework
  • with this in place we can iterate in mini-data challenges
    • wrong data mix?: change it
    • wrong output format?: change it
    • wrong photon reconstruction algorithm? change it
  • we want to be a position where re-running a mini-challenge is not big deal
  • in parallel develop everything else
    • code correctness
    • execution speed
    • design and implement analysis system
    • raw data generation
    • planning for test bed for full data challenge
    • reconstructed data format
  • find bottle-necks at intermediate scale
  • say by September 1 and every two weeks after that

Analysis System

  • another major problem: we don't have one
  • more of a design and development effort
  • can be fed by data from mini-challenges
    • event format
    • storage requirements/configuration
    • data discovery
    • user tools

References

  • PanDA proposal for system development for non-Atlas projects
  • ATLAS data challenge note
  • EventStore