GlueX Offline Meeting, August 7, 2018

From GlueXWiki
Revision as of 16:03, 21 August 2018 by Marki (Talk | contribs) (HDGeant4 issues)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

GlueX Offline Software Meeting
Tuesday, August 7, 2018
2:00 pm EDT
JLab: CEBAF Center A110
BlueJeans: 968 592 007

Agenda

  1. Announcements
    1. reconstruction launch version set: version_recon-2017_01-ver03_jlab.xml (Mark)
    2. Status of Recon Launch (Alex A.)
  2. Review of minutes from the July 24 meeting (all)
  3. Splitting up Sim-Recon: Aftermath (Sean, Mark)
  4. HDGeant4 issues (all)
  5. Review of recent pull requests (all)
  6. Review of recent discussion on the GlueX Software Help List (all)
  7. Action Item Review (all)

Communication Information

Remote Connection

Slides

Talks can be deposited in the directory /group/halld/www/halldweb/html/talks/2018 on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2018/ .

Minutes

Present:

  • CMU: Curtis Meyer
  • FIU: Mahmoud Kamel
  • FSU: Sean Dobbs
  • JLab: Alex Austregesilo, Thomas Britton, Mark Dalton, Stuart Fegan, Mark Ito (chair), David Lawrence, Justin Stevens, Beni Zihlmann

The chairman neglected to hit the record button on BlueJeans.

Announcements

  1. reconstruction launch version set: version_recon-2017_01-ver03_jlab.xml. The tag of sim-recon used in the reconstruction has been built on five platforms.
  2. Status of Recon Launch: Alex A.
    • We are using QCD12 boxes with farm18 nodes shown on the SciComp webpage (see figure), but not in active use yet.
      Farm nodes.png
    • There are 700-800 jobs running simultaneously.
    • There will be 300 to 400 more when the farm18 nodes are activated.
    • We are 85% done with 2016 data; it will be done in 2 or 3 days.
    • Spring 2017 reconstruction should take 15 to 20 days.
    • Possible problem with cache disk space if we run more jobs simultaneously, our pin quota is used up.
    • David remarked that since we are copying the raw data to the local disk first, they could be unpinned as soon as they are copied.

Review of minutes from the July 24 meeting

We went over the minutes.

NERSC Update

David gave us an update.

  • Chris Larrieu is back from vacation and has addressed some swif2 issues.
  • Test of reconstruction of one run with 220 files ran into a 20-job-at-a-time limit imposed by swif2. The limit is motivated by having only 1 TB of disk space at NERSC. More space than that is needed to keep the pipe full.
  • David has consulted with a Brookhaven physicist who has been working with more space.
  • Reserving an entire node is possible, but you have to "pay" in advance for the time and it may be hard to get credit back for failed jobs.
  • David plans to move to a 20 TB "cache" disk (with a file lifetime limit).
  • The plan is to try a monitoring launch over Spring 2018 data first.
  • Sean asked about what software tag was going to be used. He cautioned that there is CDC reconstruction code that should be added to augment the code being used for the current reconstruction launch.
  • Alex cautioned that the monitoring launch uses many more plugins than are used in reconstruction launches. More memory may be required.

Splitting up Sim-Recon: Aftermath

Mark led us through the announcement of the split performed Monday, July 30 and a wiki page he wrote describing how to recover branches and tags from the sim-recon repository when using the new halld_recon and halld_sim repositories.

Items that still need to be addressed:

  1. The use of the HALLD_MY directory needs to be revisited with the split repositories.
  2. A procedure for recovering tagged versions of sim-recon and deploying them in the split repositories needs to be developed.
  3. The automatic builds triggered by pull requests needs to be implemented on the new repositories.

HDGeant4 issues

We reviewed the recent pull requests from Richard Jones fixing separate issued in the FDC simulation one in HDGeant (GEANT 3) and the other in HDGeant4. See his comment, submitted today, on HDGeant4 Issue #54. Corresponding pull requests to the halld_sim and hdgeant4 repositories have been merged to their respective master branches.

Review of recent pull requests

The title of Pull request #1180 from David served as a reminder to upbraid us for adding frustration to his workflow. The issue is respect (or rather disrespect) for a mechanism for building sim-recon (at the time of the request) without all of the packages we build, i. e., a mechanism for having optional packages. Whether a package is optional or not is signaled by the absence or presence of the home environment variable for the package. When collaborators do not respect this convention, David is stuck either building the suddenly non-optional package or coding the mechanism in himself.

David has looked into the idea of having build "flavors;" configurations of the build with optional packages explicitly identified. That takes the configuration out of the shell environment. In general, he thinks that we may be due for re-factoring the SCons build system (SBMS) in any case.

Review of recent discussion on the GlueX Software Help List

We looked at recent posts.

  • None of those present, other than Mark, have experienced the halld web authentication error (401).
  • The cause of the g++ internal compiler error, on random source code files, during the single-threaded build of hdgeant4 on the ifarm machines and on no others is still a mystery.