GlueX Software Meeting, August 18, 2020
GlueX Software Meeting
Tuesday, August 18, 2020
3:00 pm EDT
BlueJeans: 968 592 007
- Announcements
- Draft of DSelector documentation (Sean)
- New version set: 4.24.1 (Mark)
- Review of Minutes from the Last Software Meeting (all)
- Report from the Last HDGeant4 Meeting (all)
- Report from SciComp Meeting, August 6 (Mark)
- Restoration of Execution Tests for Pull Request Builds (Sean)
- Review of recent issues and pull requests:
- halld_recon
- halld_sim
- Review of recent discussion on the GlueX Software Help List (all)
- Action Item Review (all)
Present: Sean Dobbs, Mark Ito (chair), Igal Jaegle, Naomi Jarvis, Justin Stevens, Nilanga Wickramaarachchi, Beni Zihlmann
There is a recording of his meeting on the BlueJeans site. Use your JLab credentials to authenticate.
- Draft of DSelector documentation Sean noted that the latest version of the document is on Overleaf. Naomi is the owner. Please contact her if you would like review/contribute the to document.
- Naomi, Justin, and Beni noted that JLab now has a license of some sort. Mark will contact David Lawrence about the details. [Added in press: Mark forwarded the response he got from David to the aforementioned folks.]
- Naomi reported on a issue with "Zombie ProofLite servers". Her report:
- We were having a problem with zombie proofserv.exe processes left behind on our cluster nodes after the jobs completed, they were not cleaned up by slurm. Every so often, one of many worker threads would fail to initialize properly, as seen in the note on how many threads had gone parallel. The job output was unaffected, although presumably it took fractionally longer than it should have done, but after hundreds of DSelector jobs had been run, there were a large number of zombie processes left behind.
- It turned out to be sort of a network issue, which caused TProof to timeout (after only 5 seconds) on some of the forked threads. Since it timed out, it skipped that thread and did not destroy it when ROOT exited. Setting a longer timeout solved the problem.
- This can be done by appending the following line to ${HOME}/.rootrc: "ProofLite.StartupTimeOut 1800", which gives a very generous timeout allowance of 30 minutes.
- New version set: 4.24.1 Mark pointed out
- This version set has the Python-3 build changes incorporated.
- A new container was created to track the new RPM required for HDGeant4 (tirpc-devel).
- The patch version set solves the "hang on first event" problem of HDGeant4.
Review of Minutes from the Last Software Meeting
We went over the minutes from the August 4.
- Since the last meeting, no one has noticed the slow wiki access on that we discussed. The only known change was the doubling of the period between refreshes of the MCwrapper web page that Thomas Britton reported last time.
- The dE/dx theta correction for the CDC seems happy in its new home, merged onto the master branch of halld_recon.
- The Python 3 compatible build that Mark described was not as universal as he had hoped. It continues to work for both CentOS 7 and Fedora 32, but Ubuntu 20 and CentOS 8 did not work without modifications to the scheme. The problem is the way different distributions treat the interpretations (or lack thereof) of Python, Python 2, Python 3, and of Scons, SCons 2, Scons 3. Turns out that each distribution is a special case. This has delayed deployment of a CentOS 8 container.
- Naomi reported a new feature that Theo Larrieu added to the electronic logbook that allows selection of multiple images to be uploaded at one time. This has worked for her. Mark will check to see if Mark Dalton has used the feature.
Report from the Last HDGeant4 Meeting
We went over the minutes from the meeting on August 11 without significant comment.
Report from SciComp Meeting], August 6
Mark showed a slide summarizing items from the meeting. The big message is that Scientific Computing is moving to substantial support of the OSG at JLab, including contributing compute resources. Details of the plan are not known at present, but they are working on getting all of the component pieces working.
Restoration of Execution Tests for Pull Request Builds
Sean described bringing back execution of binaries, e.g., hd_root, as part of the automatic pull request test procedure. This has been broken for many moons. In the process the Mark improved the environment set-up procedure. We need to do similar tests for hdgeant4 next. Sean will look into it.
Review of recent issues and pull requests
Sean discussed halld_recon Pull Request #432: Suppress geometry-related warning messages". At present, merging is waiting on a new version of JANA to appear in a future version set. This is on Mark's list.
Action Item Review
- Ask David about JLab's Overleaf license (Mark, done)
- Create an FAQ on the zombie prooflite solution (Mark)
- Ask Mark D. about multiple file uploads with the electronic logbook (Mark)
- HDGeant4 run-time testing for pull requests (Sean)
- Create a version set with a new version of JANA sourced from GitHub (Mark)