GlueX Software Meeting, March 30, 2022
GlueX Software Meeting
Wednesday, March 30, 2022
1:30 pm EDT
Zoom Meeting ID: 161 869 2159 Passcode: 634605 Join
Mark Ito is inviting you to a scheduled ZoomGov meeting.
Topic: GlueX Software Time: This is a recurring meeting Meet anytime
Join ZoomGov Meeting https://jlab-org.zoomgov.com/j/1618692159?pwd=VGpjT1BZR2hKWmp1S0E3aEpHUlRiQT09
Meeting ID: 161 869 2159 Passcode: 634605 One tap mobile +16692545252,,1618692159# US (San Jose) +16468287666,,1618692159# US (New York)
Dial by your location
+1 669 254 5252 US (San Jose) +1 646 828 7666 US (New York) +1 669 216 1590 US (San Jose) +1 551 285 1373 US 833 568 8864 US Toll-free
Meeting ID: 161 869 2159 Find your local number: https://jlab-org.zoomgov.com/u/acAwo1X4w9
Join by SIP firstname.lastname@example.org
Join by H.323 220.127.116.11 (US West) 18.104.22.168 (US East) Meeting ID: 161 869 2159 Passcode: 634605
- Review of Minutes from the Last Software Meeting (all)
- FAQ of the Fortnight: How do I use secure shell (ssh) to access GitHub?
- JANA2 (Nathan)
- Changing BMS_OSNAME (Mark)
- Succession of software coordinator responsibilities: spreadsheet
- Review of recent issues and pull requests:
- Review of recent discussion on the GlueX Software Help List (all)
- Action Item Review (all)
Present: Alex Austregesilo, Nathan Brei, Thomas Britton, Eugene Chudakov, Sergey Furletov, Mark Ito (chair), Igal Jaegle, Richard Jones, David Lawrence, Curtis Meyer, Justin Stevens, Simon Taylor, Beni Zihlmann
There is a recording of this meeting on the ZoomGov site. (Passcode: 7itcSj^X)
This will be the last Software Meeting that Mark will chair. He is retiring from the Lab.
Review of Minutes from the Last Software Meeting
We went over the minutes from the meeting on March 16th.
- Mark mentioned that he is seeing an increasing number of requests from users to be added the halld Slurm account when they are already members of the halld Unix group. He does not know why. The command line actions linked from FAQ for managing the Slurm account work well.
FAQ of the Fortnight: How do I use secure shell (ssh) to access GitHub?
Mark led us through the FAQ that describes a method for possible password-less interaction with the GitHub site, including pushes, using ssh keys. Using this method became more attractive after GitHub removed the ability to use one's account password for authentication. A personal access token is now required in lieu of the password.
Richard remarked that support for ssh authentication may be dropped in the not-so-distant future.
Nathan described efforts to validate REST file production with JANA2 by comparing results with the same activity using JANA. By exploiting the toStrings member function, present in all JANA classes, he can create a text dump of produced objects in the two contexts and compare them. That produced a large number of diffs that are hard to interpret. To address that problem, he recently developed a technique where objects can be compared in the order that they are produced. That way the difference produced furthest upstream can be identified and separated from resulting downstream differences. In addition he has developed a new tool, JInspector, that allows inspection of factories during execution. The scheme allows invocation of JInspector from a gdb session. The next step is to build a system that automatically compares JANA output with that of JANA2, halts when differences are found, and allows interactive invocation of JInspector.
Richard reported that he has developed a tool that does similar things in the context of insuring reproducibility of results on repeated runs of our code. To use the tools one instruments the code being examined such that intermediate results from selected points of execution (i.e., those instrumented) are dumped to a binary file the first time through execution. Subsequent execution of the same code will notice that the binary log exists and compare the results from the current execution to those obtained in the initial pass. One complication that is addressed is that many discrepancies only occur in a multi-threaded context where the results are generally not produced in the same order run-to-run. This means that here has to be a search of the log, forward and backward, for each result as it is produced in the current execution of the program. When a difference is detected, the user is dropped into the debugger. The tool is available on GitHub.
Nathan asked Richard about how he compares ROOT histograms generated from his test runs. Richard wrote some personal code to do the job.
Alex asked Richard about where he is on the reproducibility front. Richard has but the effort aside temporarily after noticing that additional issues have cropped up in recent bouts of code development. Reproducibility is a major concern for Nathan when comparing JANA and JANA2. If JANA is not producing the same results every time, then that will show up as a difference with JANA2 having nothing to do with JANA2 itself.
Richard explained that the main contributor to non-reproducibility is the ANALYSIS library. The code has many history-driven complications, complications that not only induce problems, but make debugging difficult.
Nathan has been looking only at REST production. Justin suggested that he look at the entire suite of plugins used in reconstruction launches as an acid test of result registration between JANA and JANA2.
Justin asked about at what point do we declare victory. Mark thought that giving up on bit-for-bit reproducibility and bit-for-bit registration between JANA and JANA2 would be a form of surrender. They underlie validation of any number of future software development; one must know that that things that should not be changing are in fact not changing. On the other hand those goals are formidable from where we sit now. There might be a judgment to make on how seriously this uncomfortable position affects physics results.
Mark presented a proposal for changing the value of BMS_OSNAME, the environment variable that we use to identify properties of the operating system in use. One of the main goals is to use a single build for both the CentOS 7 nodes on the JLab farm and the CentOS 7 container on the OSG and HPCs. See his slides for the details. From the discussion:
- There are cases where the processor architecture field (e.g., x86_64) is useful. We should consider keeping it.
- Richard exploits the feature of having different OSs with separate bin, lib, etc. directories in a built package, with all OSs sharing the same code.
- We were not ready to make a decision on going forward with the proposal. More discussion is needed.
Succession of software coordinator responsibilities
We took a quick look at the Google Doc listing the responsibilities. Time was running short so Alex proposed a dedicated meeting of interested parties later in the week.