Difference between revisions of "CMU Data Challenge 2"

From GlueXWiki
Jump to: navigation, search
Line 8: Line 8:
 
#* Running smoothly since ~Tuesday.
 
#* Running smoothly since ~Tuesday.
 
# As of 10:30am, 1174 jobs have completed.
 
# As of 10:30am, 1174 jobs have completed.
#* 9001 Series - 859    1E7 with EM Background : 21.47 MEvents : 1 failure (DMagneticFieldMapFineMesh::GetFieldAndGradient())
+
#* 9001 Series - 859    1E7 with EM Background (25k Events Each) : 21.47 MEvents : 1 failure (DMagneticFieldMapFineMesh::GetFieldAndGradient())
#* 9002 Series - 225    5E7 with EM Background : 2.25 MEvents : 0 failures  
+
#* 9002 Series - 225    5E7 with EM Background (10k Events Each) : 2.25 MEvents : 0 failures  
#* 9003 Series -  90    without EM Background  : 4.45 MEvents : 1 failure (Job lost to the aether)
+
#* 9003 Series -  90    without EM Background  (50k Events Each) : 4.45 MEvents : 1 failure (Job lost to the aether)

Revision as of 11:31, 28 March 2014

  1. At CMU we are using 12 boxes, each with 4 8-core AMD Opteron Processors (32 cores per box). Each box has 64GB of physical memory. Data are being written to a local RAID disk. Jobs are manage by PBS (torque and maui).
  2. All 384 cores are reserved for the data challenge for three weeks.
  3. Did not switch to optional version.
  4. Start-up Problems
    • All jobs were initially reading from the same copy of sqlite, resources, and hdds, instead of having their own copies.
    • Large-cluster configuration problems slowed our start. Resolved by tuning PBS parameters to control the rate at which pbs_mom talked to the head node.
    • Still battling a scheduler issue. Work-around has been found.
    • Running smoothly since ~Tuesday.
  5. As of 10:30am, 1174 jobs have completed.
    • 9001 Series - 859 1E7 with EM Background (25k Events Each) : 21.47 MEvents : 1 failure (DMagneticFieldMapFineMesh::GetFieldAndGradient())
    • 9002 Series - 225 5E7 with EM Background (10k Events Each) : 2.25 MEvents : 0 failures
    • 9003 Series - 90 without EM Background (50k Events Each) : 4.45 MEvents : 1 failure (Job lost to the aether)