GlueX Offline Meeting, February 9, 2018
GlueX Offline Software Meeting
Friday, February 9, 2018
10:00 am EST
JLab: CEBAF Center A110
BlueJeans: 968 592 007
- 1 Agenda
- 2 Communication Information
- 3 Minutes
- Review of minutes from the January 26 meeting (all)
- Collaboration Meeting
- SciComp Meeting Report (Mark)
- New releases: build_scripts 1.26, rcdb 0.03, sqlitecpp 2.2.0, sim-recon 2.26.0, hdgeant4 1.6.0 (Mark)
- AMD benchmark results (Sean)
- GlueX + NERSC (David)
- Review of recent pull requests (all)
- Review of recent discussion on the GlueX Software Help List (all)
- Meeting on Containers, 11:30 today, A110. Same BlueJeans number.
- Action Item Review (all)
- The BlueJeans meeting number is 968 592 007 .
- Join the Meeting via BlueJeans
Talks can be deposited in the directory
/group/halld/www/halldweb/html/talks/2018 on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2018/ .
- CMU: Curtis Meyer
- FSU: Sean Dobbs
- JLab: Alex Austregesilo, Thomas Britton, Hovanes Egiyan, Mark Ito (chair), David Lawrence, Beni Zihlmann
- Yerevan: Hrach Marukyan
There is a recording of this meeting on the BlueJeans site. Use your JLab credentials to access it.
- Reclamation of halld-scratch volume set. 19 of our tapes are about to be erased.
- New top-level directory: /mss/halld/detectors. A new tape directory for data related to specific detectors, e. g., /mss/halld/detectors/DIRC.
- New sim-recon release: version 2.23.0. This release includes Simon's newly tuned parameters for track matching to calorimeter clusters.
- New simple email list: online_calibrations. Sean has created a Simple Email List for those interested in keeping track of the latest calibration results.
- Launches. Thomas caught us up.
- There is a monitoring launch that will start soon.
- The reconstruction launch with recent software on Spring 2016 data has started.
- Several people are looking at the anomaly that Beni has identified, visible in some TOF monitoring plots (though the TOF is blameless).
Review of minutes from the January 26 meeting
We went over the minutes.
In the context of discussing the recent changes with using the RCDB in the offline software, we started a more general discussion of whether certain packages, and RCDB in particular should be optional. David explained that there are certain packages (among them RCDB, the ET system) that have been maintained as optional. The mechanism:
- If the "home" environment variable (e. g., RCDB_HOME) is not defined, then at build time, the build system (i. e., SCons) does not include the corresponding include paths and libraries in the build commands. No dependence on the package is built into the resulting code. A warning that this is happening is printed during the build. This does not halt the build.
- If any program requires the package in question, then it is built so that when the program is run, it exits immediately with an error informing the user that the environment variables for the missing package needs to be defined and that package built. Then the build of the program can be redone.
- To implement this behavior appropriate C-preprocessor directives need to be included in the source code so that the resulting program has the appropriate behavior depending on the setting or non-setting of the home environment variable.
There was a lot of discussion on whether this is a useful feature that should be maintained, mainly between two of us. An incomplete summary:
- David. Having the ability to exclude packages whose functionality is completely irrelevant to the software builder is very convenient: extraneous packages need not be built and the resulting disk footprint of the code is reduced.
- Mark. Having this flexibility requires coding and creates trap doors that code software builders can fall into. If a needed environment variable is missing, then the warning at build time can easily be missed and the error at run time may leave the user (who may not be the builder) at a loss as to how to proceed.
Mark will put the issue on the agenda of a future meeting.
Release Management Thoughts
In the course of discussing Sean's presentation from last time, Thomas brought up the idea of breaking up the sim-recon repository into two repositories. In a moment of inspiration he came up with a concept for the split: one repository for simulation and one for reconstruction. This would simplify the task of "release management" (as defined last time). The technical advantage is that simulation code can be versioned independently of the reconstruction code. Right now, Sean has to maintain a reconstruction-fixed-simulation-changing branch of sim-recon to get the right behavior. If the two functions were versioned separately, this recon-fixed-sim-changing property would be manifest.
We noted that the two sides (sim and recon) are functionally closely tied together. The question is the degree of independence of the development streams on the two sides. E. g. if drift time in tracking chambers is improved in simulation, then does the tracking reconstruction have to change? Likely not, the sim side can go ahead independently. On the other hand, if there is reconstruction code that does one thing for real data and another for simulation, and the simulation is improved so that unequal treatment is no longer necessary, then both sides have to change together. In the latter case it would be easier if both sides were in the same repository.
Mark will put the issue on the agenda of a future meeting.
Not-the-TOF Anomaly in Monitoring Histograms
Alex noted that the problem first appeared several months ago when the material maps were changed in the CCDB. Beni has reported that the most recent version of the code does not exhibit the problem, so the current mystery is how the problem could have possibly fixed itself.
GlueX + NERSC
David has succeeded in running GlueX reconstruction jobs on two of the NERSC supercomputers.
- Cori I: Haswell (comparable to the JLab farm)
- Cori II: Knight's Landing (KNL)
He analyzed two runs on both architectures. See the results on his slide. He notes that the KNL jobs run 2.4 times slower than the Haswell even though they are using four times the number of threads, i. e., ten times slower on a per thread basis.
Sean has put together a nice little session for us on the Collaboration Meeting agenda.
SciComp Meeting Report
Mark reported items from the Scientific Computing meeting held Thursday, February 1.
- Change to fair share allocations.... The change was discussed at the meeting.
- ENP consumption of disk space under /work.
- The second shelf of traditional raid is up and running. Our work disk quota has been increased to 110 TB from 66 TB.
- Hall B migration to the new work disk is underway. That should free up 85 TB of cache space. Mark emphasized that Hall D needs more cache space.
- Hall B needs 40 GB per job for the Java virtual machine. This is causing cores to go idle as nodes are running in a memory-limited mode.
- New tape drives, playing with four new LT07 drives and four new LT08 drives. They are planning the upgrade path.
AMD benchmark results
Sean has purchased a box with new AMD EPYC processors and ran benchmarks of hd_root on it. For comparison he ran the same tests on gluon119 which has Intel Xeon processors. See his slide for details and results. Scaling for the two systems is comparable as the number of threads is increased, though the AMD processors come a one-third the price (on the CPU package itself).
Meeting on Containers
Mark announced a meeting later in the day to discuss use of containers (Docker, Singularity) in various computing contexts (NERSC, OSG, JLab farm, personal laptops). There is a lot of ground to cover here; a series of meetings is likely.