GlueX Offline Meeting, September 20, 2017

From GlueXWiki
Jump to: navigation, search

GlueX Offline Software Meeting
Wednesday, September 20, 2017
11:00 am EDT
JLab: CEBAF Center F326/327

Agenda

  1. Announcements
    1. Automatic updates of ccdb.sqlite on OASIS (Mark)
    2. HDvis progress
  2. Review of minutes from the last meeting (all)
  3. Report from the JLab Computing Steering Committee Meeting (Mark)
  4. Review of recent pull requests (all)
  5. Review of recent discussion on the GlueX Software Help List (all)
  6. Action Item Review (all)

Communication Information

Remote Connection

Slides

Talks can be deposited in the directory /group/halld/www/halldweb/html/talks/2017 on the JLab CUE. This directory is accessible from the web at https://halldweb.jlab.org/talks/2017/ .

Minutes

Present:

  • CMU: Curtis Meyer
  • FSU: : Brad Cannon, Sean Dobbs
  • Glasgow: : Peter Pauli
  • JLab: : Alex Austregesilo, Thomas Britton, Eugene Chudakov, Sebastian Cole, Sergey Furletov, Mark Ito (chair), David Lawrence, Justin Stevens, Simon Taylor, Beni Zihlmann

There is a recording of this meeting on the BlueJeans site. Use your JLab credentials to access it.

Announcements

  1. Automatic updates of ccdb.sqlite on OASIS. Mark started a system to keep the CCDB up-to-date for grid jobs. There are still a few authentication wrinkles to iron out of the system.
  2. HDvis progress. Thomas summarized recent progress. See the recording starting at 6:50 for the visuals.
    • Recent work has been on performance. Overall, a factor of 3-5 in speed has been realized.
    • All of the BCAL modules were rendered before, now just the hit ends are shown.
    • Only the hit bars of the TOF are shown.
    • The FCAL no longer appears cut in half when viewed from certain angles.
    • There is a play/pause button now.
    • In the future, there will be a time-selecting slider bar.

Review of minutes from the last meeting

We went over the minutes from September 6; there was no significant discussion.

Report from the JLab Computing Steering Committee Meeting

Chip Watson gave a summary of SciComp activities at the meeting on August 23. See his slides for the details. Some highlights directly relevant for us:

  • A new work disk server is coming soon.
  • An NERSC share has been granted to explore using that facility to augment computing capability beyond that of the JLab farm.

Sean asked if SciComp was going to provide tools to help use this new resource. Mark replied that the SciComp is willing to commit manpower to development of tools, but that has not started yet. Related to this David reported that his Lab-Directed Research and Development (LDRD) proposal has been approved. This effort to develop a next-generation JANA also involves bench-marking the new system at NERSC. He has been awarded a share, independent of the one that Chip announced. That effort will give Hall D some in-house experience running on this new (to us) platform.

Computing Resources

Alex brought up two important issues.

Analysis of REST Files

Alex reported that there are a lot of jobs on the farm these days from individual GlueX collaborators running over REST files. There are no production launches in progress, so at present they do not cause conflict, but there is potential for conflict in the future. We should encourage use of the clusters at collaboration institutions for these tasks, like is being done at CMU and IU.

Sean pointed out that the analysis launches have been the most efficient way to go through the REST data in a way that benefits the greatest number of individual analyses. After some discussion we decided to institute monthly analysis launches, more if there is a pressing demand, to go through the REST data. [Added in press: launches will be on the last Friday of every month. The next one will therefore be on Friday, September 29. Mark your calendar.]

Cache Disk Space

Alex reported that the pin quota on our cache disk has been reduced recently by SciComp, down to 160 TB from 350 TB[?]. This has caused REST data to be deleted. The lack of spinning REST data has in turn slowed the progress of the individual-collaborator jobs, mentioned in the previous section, due to jobs having to wait on tape retrieval of input files.

Mark reminded us that in our original plans we had a rough guess of disk space needed for reconstructed data of 500 TB. The plan was to put the data on work where we can control when data goes on the disk and when it comes off. Now that we have the pinning mechanism, the cache disk seems a much more natural place since all of the data is on tape and the cache disk is a partial mirror of tape. And in fact we probably underestimated the amount needed. Intermediate term, 1 PB seems a more reasonable number. Although near-term, disk space is budget limited, we should think about upping our request.

Other points:

  • Alex told us that the REST data for Spring 17 alone is 120 TB. Spring 16 is not much smaller than that. Spring 17 REST nearly saturates the current pin quota which is already maxed out with analysis launch products.
  • When raw data is processed, those files are charged against our pin quota. We cannot unpin them when the jobs finish; that is under control of the farm system. At certain points in time this can be a large amount of data.
  • When unpinned data are deleted, it is done on the basis of modification time (time since the file was put on disk) and not access time (time since the file was last read). It would be more efficient if access time could be used. Frequently used files could hang around, eliminating the need to re-fetch them from tape.

This discussion highlighted the need for a more detailed analysis of disk requirements. We should review previous estimates and understand changes called for now that we have real experience. Justin pointed out that we are planning another reconstruction pass on the raw data in a few weeks (with corrected FCAL calibration, updated CDC dE/dx calculations, updated TOF timing constants, etc.) and we should look ahead to what that will require.

Review of recent pull requests

We went over the the pull requests since the last meeting.

Richard Jones checked in a change that simulates Cerenkov radiation in the FCAL light guides. David reminded us that he has seen this effect for particles that pass through the lead glass. He also noted that now that it is in the simulation, there needs to be an effort to incorporate it in mcsmear as an effective contribution to the "energy" for these particles.

Review of recent discussion on the GlueX Software Help List

We looked at the list. There was no significant discussion.

Action Item Review

  1. Institute a monthly Analysis Launch. (Alex)
  2. Review and update the disk resource estimate. (Mark)
  3. Incorporate light-guide-induced Cerenkov radiation in mcsmear. (TBA)