Difference between revisions of "HDGeant4 Meeting, October 22, 2019"

From GlueXWiki
Jump to: navigation, search
(Agenda)
Line 14: Line 14:
 
# [https://github.com/JeffersonLab/HDGeant4/pulls Pull Requests on GitHub]
 
# [https://github.com/JeffersonLab/HDGeant4/pulls Pull Requests on GitHub]
 
# Action Item Review
 
# Action Item Review
 +
 +
== Minutes ==
 +
 +
Present:
 +
* ''' CMU: ''' Naomi Jarvis
 +
* ''' FSU: ''' Sean Dobbs
 +
* ''' JLab: ''' Alex Austregesilo, Mark Ito (chair), Igal Jaegle, Keigo Mizutani, Simon Taylor, Beni Zihlmann
 +
* ''' UConn: ''' Richard Jones
 +
 +
There is a [https://bluejeans.com/s/OCFCO/ partial recording of this meeting] on the BlueJeans site. Use your JLab credentials to get access.
 +
 +
Note: many of the agenda items were held over from the Software Meeting that should have happened last week.
 +
 +
=== Review of minutes from the last Software Meeting ===
 +
 +
We went over minutes from [[GlueX Software Meeting, September 17, 2019#Minutes|the meeting on September 17]].
 +
 +
* Sascha Somov has checked in a fix for the '''tagger energy determination'''. He has created a new CCDB table to do so, but so far it is only populated for PrimEx runs. The software senses the presence of the table and uses it if it is available.
 +
* On '''software versions and calibration constant comparisons''', Sean revised his approach for the FCAL gain constant used in Monte Carlo to use a new CCDB calibration table. This restores backward compatibility of the code with legacy calibration constant sets. See [https://github.com/JeffersonLab/halld_sim/pull/75 halld_sim pull request #75].
 +
* '''CCDB Ancestry Control''' Mark has discussed this with Dmitry. Dmitry has a relatively easy solution. They will work on the exact implementation.
 +
 +
=== Report from SciComp Meeting ===
 +
 +
Mark reviewed his [https://markito3.wordpress.com/2019/10/21/scicomp-october-17-2019/ notes from the SciComp meeting] held on October 17.
 +
 +
* New farm nodes are due in at the end of the month. They have AMD Rome processors.
 +
* The farm will get an upgrade from RHEL 7.2 to 7.7 in the coming weeks.
 +
** Beni urged us to make sure our code works with a RHEL 7.7 system as soon as one becomes available.
 +
* Slurm was upgraded on the server when the server was moved to more robust hardware.
 +
* There was an issue where the firewall between the farm network enclave and the general computing network would get pinned. This could cause a slowdown in ifarm response. A new firewall is on order.
 +
* Brad Sawatzky from Hall C has [https://jlab.service-now.com/nav_to.do?uri=%2Fincident.do%3Fsys_id%3Daf9be5bbdb1088507d37365e7c96195c%26sysparm_view%3Dess%26sysparm_record_target%3Dincident%26sysparm_record_row%3D7%26sysparm_record_rows%3D71%26sysparm_record_list%3Dcaller_id.nameCONTAINSbrad%255EORDERBYDESCnumber reported seeing dead-locking] with flocks on the work disk (a ZFS system shared among all of the Halls). These are suspiciously similar to the problems that [https://groups.google.com/forum/#!topic/gluex-software/-3gmYyZCk6c Simon] and [https://groups.google.com/forum/#!topic/gluex-software/5ydjauELiN0 Alex] have reported. These seem to have surfaced recently. SciComp is working on it.
 +
** Mark brought the idea, often raised by David Lawrence of a compact SQLite CCDB database that only serves out a subset of run numbers. This would help because a common practice is to copy the SQLite file to a local disk and access it from there to avoid problems with network mounted file systems. It might be considerably smaller than the full database at 1 GB.  Naomi suggested that a file that serves an entire run period would be most useful. There was interest in developing the idea, although Alex pointed out that the size of the full database it not all that huge.
 +
* Brad S. is planning a [https://docs.google.com/document/d/1IBrxtxCe4Vd7jq2W69G9hYpdj1hPXX9eq3ZZ_QaW_n4/edit?usp=sharing JLab Scientific Computing Workshop].
 +
 +
=== Geometry update in CCDB to support TOF upgrade ===
 +
 +
Richard found some hard-wired geometry constants in the code for hdgeant and hdgeant4. The total number of counters was not in the HDDS XML description; they have been added now. Also a new attributed has been added to indicate whether a counter a channel reads out a single-ended or double-ended module. For hdgeant4, there is a hard exit if this "pairing" information is not there. The new scheme has been been tested on old and new run numbers and has been merged to the master branch.
 +
 +
=== TOF II Reconstruction Anomaly ===
 +
 +
Sean showed [https://halldweb.jlab.org/wiki/images/b/b9/Sdobbs_HDGeant4_20191022.pdf plots of TOF monitoring histograms] from 200 k bggen events using TOF II geometry and reconstruction. He sees a slope in the difference between the which-counter-was-hit position info from the TOF when compared to the track extrapolation position as a function of counter number.
 +
 +
[Added in press: this effect was tracked down to an incorrect time propagation constant being used for simulated TOF data. Sean found and fixed it.]
 +
 +
=== Review of HDGeant4 Issues on GitHub ===
 +
 +
We skimmed headlines of [https://github.com/JeffersonLab/HDGeant4/issues the issues].
 +
* Alex submitted an issue where HDG4 bombs when trying to propagate neutrons. The problem is not present in HDG3. Jon Zarling reports the same error.
 +
* Mark mentioned that when he showed the issues list at the Geant4 Collaboration meeting that was held at JLab, one of the attendees mentioned that they had seen similar errors with G4VoxelNavigation. Mark will try to track him down and get the scoop.
 +
 +
=== Items from the Software Help List ===
 +
 +
* Alex brought our attention to a [https://groups.google.com/forum/#!topic/gluex-software/pmAqif-NVZY thread on the Software Help list] started by Peter Pauli. It appears that not all the versions needed to simulate 2018 data are available on the OSG. Mark et al. will try to straitened this out.
 +
* Alex asked about any progress on the b1pi test error in mcsmear that Mark posted on the list. As Sean pointed out in his response, it is likely that inconsistent versions of the geometry are being used. The test is still using hdgeant, so the geometry is compiled into the binary and reflects whatever is in the HDDS build.

Revision as of 15:29, 23 October 2019

HDGeant4 Meeting
Tuesday, October 22, 2019
2:00 pm EDT
JLab: CEBAF Center, A110
BlueJeans: 968592007

Agenda

  1. Review of minutes from the last Software Meeting (all)
  2. Report from SciComp Meeting (Mark)
  3. Geometry update in CCDB to support TOF upgrade (Richard)
  4. Issues on GitHub
  5. Pull Requests on GitHub
  6. Action Item Review

Minutes

Present:

  • CMU: Naomi Jarvis
  • FSU: Sean Dobbs
  • JLab: Alex Austregesilo, Mark Ito (chair), Igal Jaegle, Keigo Mizutani, Simon Taylor, Beni Zihlmann
  • UConn: Richard Jones

There is a partial recording of this meeting on the BlueJeans site. Use your JLab credentials to get access.

Note: many of the agenda items were held over from the Software Meeting that should have happened last week.

Review of minutes from the last Software Meeting

We went over minutes from the meeting on September 17.

  • Sascha Somov has checked in a fix for the tagger energy determination. He has created a new CCDB table to do so, but so far it is only populated for PrimEx runs. The software senses the presence of the table and uses it if it is available.
  • On software versions and calibration constant comparisons, Sean revised his approach for the FCAL gain constant used in Monte Carlo to use a new CCDB calibration table. This restores backward compatibility of the code with legacy calibration constant sets. See halld_sim pull request #75.
  • CCDB Ancestry Control Mark has discussed this with Dmitry. Dmitry has a relatively easy solution. They will work on the exact implementation.

Report from SciComp Meeting

Mark reviewed his notes from the SciComp meeting held on October 17.

  • New farm nodes are due in at the end of the month. They have AMD Rome processors.
  • The farm will get an upgrade from RHEL 7.2 to 7.7 in the coming weeks.
    • Beni urged us to make sure our code works with a RHEL 7.7 system as soon as one becomes available.
  • Slurm was upgraded on the server when the server was moved to more robust hardware.
  • There was an issue where the firewall between the farm network enclave and the general computing network would get pinned. This could cause a slowdown in ifarm response. A new firewall is on order.
  • Brad Sawatzky from Hall C has reported seeing dead-locking with flocks on the work disk (a ZFS system shared among all of the Halls). These are suspiciously similar to the problems that Simon and Alex have reported. These seem to have surfaced recently. SciComp is working on it.
    • Mark brought the idea, often raised by David Lawrence of a compact SQLite CCDB database that only serves out a subset of run numbers. This would help because a common practice is to copy the SQLite file to a local disk and access it from there to avoid problems with network mounted file systems. It might be considerably smaller than the full database at 1 GB. Naomi suggested that a file that serves an entire run period would be most useful. There was interest in developing the idea, although Alex pointed out that the size of the full database it not all that huge.
  • Brad S. is planning a JLab Scientific Computing Workshop.

Geometry update in CCDB to support TOF upgrade

Richard found some hard-wired geometry constants in the code for hdgeant and hdgeant4. The total number of counters was not in the HDDS XML description; they have been added now. Also a new attributed has been added to indicate whether a counter a channel reads out a single-ended or double-ended module. For hdgeant4, there is a hard exit if this "pairing" information is not there. The new scheme has been been tested on old and new run numbers and has been merged to the master branch.

TOF II Reconstruction Anomaly

Sean showed plots of TOF monitoring histograms from 200 k bggen events using TOF II geometry and reconstruction. He sees a slope in the difference between the which-counter-was-hit position info from the TOF when compared to the track extrapolation position as a function of counter number.

[Added in press: this effect was tracked down to an incorrect time propagation constant being used for simulated TOF data. Sean found and fixed it.]

Review of HDGeant4 Issues on GitHub

We skimmed headlines of the issues.

  • Alex submitted an issue where HDG4 bombs when trying to propagate neutrons. The problem is not present in HDG3. Jon Zarling reports the same error.
  • Mark mentioned that when he showed the issues list at the Geant4 Collaboration meeting that was held at JLab, one of the attendees mentioned that they had seen similar errors with G4VoxelNavigation. Mark will try to track him down and get the scoop.

Items from the Software Help List

  • Alex brought our attention to a thread on the Software Help list started by Peter Pauli. It appears that not all the versions needed to simulate 2018 data are available on the OSG. Mark et al. will try to straitened this out.
  • Alex asked about any progress on the b1pi test error in mcsmear that Mark posted on the list. As Sean pointed out in his response, it is likely that inconsistent versions of the geometry are being used. The test is still using hdgeant, so the geometry is compiled into the binary and reflects whatever is in the HDDS build.