GlueX Offline Meeting, November 1, 2017
GlueX Offline Software Meeting
Wednesday, November 1, 2017
11:00 am EDT
JLab: CEBAF Center F326/327
BlueJeans: 968 592 007
- 1 Agenda
- 2 Communication Information
- 3 Minutes
- 3.1 Announcements
- 3.2 Review of minutes from the last meeting
- 3.3 Report from the SciComp-Physics meeting
- 3.4 Computing Milestones
- 3.5 Review of recent pull requests
- 3.6 Review of recent discussion on the GlueX Software Help List
- 3.7 CPU Usage Projections
- Review of minutes from the last meeting (all)
- Report from the SciComp-Physics meeting
- suggested: stress daq bw before Dec. run
- Work disk migration plan: move all of it
- ifarm1101 upgrade not worth it
- Disk: Lustre procurement decision, 200 TB, more if we can justify it
- Computing milestones
- Review of recent pull requests (all)
- Review of recent discussion on the GlueX Software Help List (all)
- Action Item Review (all)
- The BlueJeans meeting number is 968 592 007 .
- Join the Meeting via BlueJeans
Talks can be deposited in the directory
/group/halld/www/halldweb/html/talks/2017 on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2017/ .
- CMU: : Curtis Meyer
- FSU : Sean Dobbs
- JLab: : Alex Austregesilo, Thomas Britton, Brad Cannon, Eugene Chudakov, Sebastian Cole, Mark Ito (chair), Simon Taylor, Beni Zihlmann
There is a recording of this meeting on the BlueJeans site. Use your JLab credentials to access it.
- Change to cache deletion policy Files that were requested from tape by farm jobs are now first in line for deletion once the requesting job has finished. This extends the life of files that users create or request directly on the cache disk.
- sim-recon 2.18.0 This release went out on October 10. It has the extensive changes to the Analysis Library from Paul Mattione.
- MCwrapper v1.9 Thomas has added a facility to automatically generate an event sample, in a user-specified run range, with runs populated in proportion to the number of events in the real data.
- Launches. Alex reported that the most recent analysis launch has been inadvertently running with a job-count restriction left over from an incident with an analysis launch a few weeks back where SciComp had to throttle the number of jobs. The restriction was lifted this morning.
Review of minutes from the last meeting
We went over the minutes from October 4.
OASIS file system
Mark discussed the cron job update problem with Richard Jones. Another way has been identified; no more problem.
Thomas reported a bug in the JANA server (which sends events up to the web browser for display). It crashes on RHEL7 and CentOS7 on a language feature not supported by the gcc 4.8.5. compiler found on those platforms. Turns out to work on RHEL6 and CentOS6 (gcc 4.9.2). Several avenues for a fix are being pursued, in consultation with Dmitry Romanov.
Work Disk Clean-Up
Mark reported that in he latest twist, we will not need to reduce our usage on /wok to a level below 45 TB before moving the the new, non-Lustre file server. Our current 55 TB will fit since not all of the Halls will be moving at once.
For the final cut-over from Lustre to non-Lustre, we will have to stop writing yo /work for a short period of time. This may (or may not) present a problem if we have a large launch underway. This issue needs further discussion, but should not be a big problem.
Report from the SciComp-Physics meeting
- Chip Watson suggested that we try to stress the data stream bandwidth from the Counting House to the Computer Center before the December run, ideally in concert with a similar test by Hall B.
- ifarm1101 and the CentOS6 farm nodes will not be upgraded from CentOS 6.5 to 6.9. It is not worth the effort for a handful of nodes.
- Physics will initiate another procurement of Lustre-based disk, 200 TB worth, to augment our volatile and work space. There is the possibility for more if we can justify it.
Mark showed us an email from Amber Boehnlein from back in August, proposing an effort to develop "milestones" for Scientific Computing in the 12 GeV era, as an aide to Lab management for gauging progress. Work on this has languished, and needs to be pick up again.
Review of recent pull requests
We went over the list of open and closed requests list of open and closed requests.
Pull request #947, "Updated FCAL geometry to new block size" has implications for detector calibrations. It comes in concert with updates to the FCAL geometry from Richard (HDDS pull requests #38 and [https://github.com/JeffersonLab/hdds/pull/42 #42). So far, the observed effects on the calibrations have been slight. Sean is monitoring the situation to see where recalibrations may be necessary.
Review of recent discussion on the GlueX Software Help List
We went over recent items with no significant discussion.
CPU Usage Projections
Beni asked about projections for our demands on the farm during the upcoming run.
- Sean thought that now that firmware problems have been understood, calibrations should go faster.
- Alex expressed doubt about whether we can support a full reconstruction launch during the run with its attendant monitoring jobs. Mark pointed out that priorities can be adjusted between accounts to give time-critical jobs priority.
- Mark pointed out that we need to review our CPU needs in light of our newly acquired experience, much like we are doing for disk usage.