Difference between revisions of "CMU Data Challenge 2"

From GlueXWiki
Jump to: navigation, search
Line 5: Line 5:
 
#* Still battling a scheduler issue. Work-around has been found.
 
#* Still battling a scheduler issue. Work-around has been found.
 
#* Running smoothly since ~Tuesday.
 
#* Running smoothly since ~Tuesday.
# As of 9:00am, 1087 jobs have completed.
+
# As of 10:30am, 1087 jobs have completed.
#* 9001 Series - 859    1E7 with EM Background : 1 failure
+
#* 9001 Series - 859    1E7 with EM Background : 1 failure : 21.47 MEvents
#* 9002 Series - 225    5E7 with EM Background : 0 failures
+
#* 9002 Series - 225    5E7 with EM Background : 0 failures : 2.25 MEvents
#* 9003 Series -  90    without EM Background  : 1 failure
+
#* 9003 Series -  90    without EM Background  : 1 failure : 4.45 MEvents

Revision as of 10:41, 28 March 2014

  1. At CMU we are using 12 boxes, each with 4 8-core AMD Opteron Processors (32 cores per box). Each box has 64GB of physical memory. Data are being written to a local RAID disk. Jobs are manage by PBS (torque and maui).
  2. All 384 cores are reserved for the data challenge for three weeks.
  3. Start-up Problems
    • Large-cluster configuration problems slowed our start. Resolved by tuning PBS parameters to control the rate at which pbs_mom talked to the head node.
    • Still battling a scheduler issue. Work-around has been found.
    • Running smoothly since ~Tuesday.
  4. As of 10:30am, 1087 jobs have completed.
    • 9001 Series - 859 1E7 with EM Background : 1 failure : 21.47 MEvents
    • 9002 Series - 225 5E7 with EM Background : 0 failures : 2.25 MEvents
    • 9003 Series - 90 without EM Background  : 1 failure : 4.45 MEvents