GlueX Offline Meeting, October 16, 2013
GlueX Offline Software Meeting
Wednesday, October 16, 2013
1:30 pm EDT
JLab: CEBAF Center F326/327
- Review of minutes from the last meeting: all
- Software Review Planning
- Data Challenge 2 (Mark)
- Vertex Smearing (Kei)
- b1pi and single track test failures (Mark)
- Mantis Bug Tracker Review
- Review of recent repository activity: all
- ESNet: 8542553
You can view the computer desktop in the meeting room at JLab via the web.
- Go to http://esnet.readytalk.com
- In the "join a meeting" box enter the Hall D code: 1833622
- Fill in the participant registration form.
To connect by telephone:
- US and Canada: (866)740-1260 (toll free)
- International: (303)248-0285 (toll call) or look up toll-free number at http://www.readytalk.com/intl
- enter access code followed by the # sign: 1833622#
Talks can be deposited in the directory
/group/halld/www/halldweb/html/talks/2013-3Q on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2013-3Q/ .
A recording of the meeting (audio and slides) is available at for a month or so.
- CMU: Paul Mattione, Curtis Meyer
- IU: Kei Moriya, Matt Shepherd
- JLab: Mark Dalton, Hovanes Egiyan, Mark Ito (chair), David Lawrence, Simon Taylor, Elliott Wolin, Beni Zihlmann
- Northwestern: Sean Dobbs
- UConn: Alex Barnes
The work disk at JLab got full yesterday. Collectively, we deleted 1 TB and 2 TB more should be added by the Computer Center today, bringing the total to 12 TB.
Elliott told us there will a much larger amount of disk space available in the counting house soon, before it is needed for online task.
Review of minutes from the last meeting
We went over the minutes from September 18.
- The decay chain reporting issue last mentioned at the collaboration meeting has an interim solution from Mark and Beni using a new attribute for the product element in the hddm_s data model (on a branch). A seamless genealogy with generations allowed both in the generator (bggen and others) and the detector simulation (hdgeant) is reported out. They are working on a tweak that will eliminate the need for that new attribute; when finished it will appear on the trunk.
- Simon discovered a bug in hdgeant that was causing the single-track reconstruction test to fail part-way through the job. It had to do with the code keeping track of secondary vertices. The next run should have the full complement of events.
- Mark pointed out that the change to the decay chain reporting (see above) would re-do this portion of the code completely. Simon's change is welcome nonetheless, for the interim.
Software Review Planning
We had an initial planning meeting last Friday. We blocked out the talks and topics for the review and identified speakers. Another meeting is planned for tomorrow.
Curtis is also preparing a comprehensive document for distribution to the committee in advance of the review. We will be able to describe progress made since the last review in detail here; many topics will have to be omitted or touched on only lightly in the oral presentations because of time. He used the analogous document from the last review as a starting point.
Data Challenge 2
Mark has run two more mini-DCs since the collaboration meeting (1000 jobs each).
The first of these showed the same calibration database problem as the one reported on at the collaboration meeting. He was able to catch the database server in the act of non-responsiveness. It turns out to be a memory problem on the server, occurs when many jobs are running at the same time and a large data set, such as the magnetic field, is requested.
The problem is solved in the second mini-DC by using the SQLite version of the database. This is a server-less, file-based system and is does not suffer from the memory limitation. In addition, this is a convenient solution for distributing calibration information to remote sites for mass processing; a single file encapsulates all needed calibration constants.
Remarks on this issue:
- The magnetic field is not a good fit for a relational database. This incident illustrates the problem. It is only in the database as a natural consequence of evolution from a standard directory-tree/file-based system.
- David mentioned that there is a feature in JANA that will dump all calibration constants used in a job to a local file system. Those files could then be distributed for use by others. This would also by-pass the need for each job to connect to a database server.
- One component of the problem is a persistent database connection for all running jobs for the duration of the job. Each running job therefore consumes memory on the server. The CCDB has a feature under development to close these connections after a suitable time and re-connect later if necessary. That would also solve the problem and will be implemented.
- We have already done most of the development on using "resources" in JANA. This is a system for caching large files (like say, magnetic field maps), locally as needed. Full implementation of this feature would also have avoided this problem. This also is still in the plan.
- It also possible to increase the memory limit on the server, but now that seems moot.
Now that this problem looks solved, the next task is to concentrate on the few percent of jobs that fail due to other reasons. More mini-DCs are to come.
Kei showed slides outlining the methods we use to randomize the location of the primary interaction vertex in our simulation code and proposed that we unify the scheme in a single code location, his preference being hdgeant.
There are different smearing schemes for the different event generators. David pointed out that the only place to have single scheme is indeed in hdgeant since the particle gun, for example, is internal to hdgeant. He volunteered to look into adding this feature. It would be optional, controlled by the control.in file.
Addition of this feature would have to be re-done for the Geant4 version of hdgeant, but that should not be a huge problem.