GlueX Offline Meeting, January 21, 2015

From GlueXWiki
Revision as of 14:33, 21 January 2015 by Marki (Talk | contribs) (Agenda)

Jump to: navigation, search

GlueX Offline Software Meeting
Wednesday, January 14, 2015
1:30 pm EST
JLab: CEBAF Center F326/327

Agenda

  1. Announcements
    1. Volatile disk expanded: reservation 10 -> 20 TB, quota 30 -> 50 TB
    2. Marty Wise working on Run Conditions (Control?) Database (RCDB)
    3. Computer Center has RHEL7 available for beta testers
    4. Work disk full
  2. Review of minutes from January 7 (all)
  3. Data Challenge 3
  4. Software Review Preparations
  5. Commissioning Run Review:
    1. Offline Monitoring Report (Kei)
      1. Ran over all files (online plugins, 2-track EVIO skim, REST) 2 weeks ago
      2. Next launch is this Friday
      3. Will be testing EventStore to mark events
      4. Quick update on CentOS65, multithread processing
    2. Commissioning-branch-to-trunk migration (Simon)
    3. Handling changing magnetic field settings (Sean)
    4. Analysis of REST file data (Justin)
    5. Data Management (Sean)
      1. Storing software information in REST files
      2. EVIO format definition for Level 3 trigger farm
      3. EventStore: implementation plan
  6. Requests to SciComp on farm features (Kei)
    1. Tools to track jobs:
      1. tools to track what percentage of nodes were being used by whom at a given time, preferably in both # of jobs and threads.We can see the pie charts for example in http://scicomp.jlab.org/scicomp/#/auger/usage but would like the information in a form that we can easily access and analyze.
      2. what % of nodes are currently available for each OS at a given time
      3. tools to track the life time of each stage of the job, such as sitting in queue, waiting for files from tape, running, etc.
      4. Would it be possible to make the stdout and stderr web-viewable?
      5. If possible, can you add the ability to search by “job name” (every job that includes the search term) in the auger custom job query website?
    2. For more general requests:
      1. better transparency for whether there are problems in the system, such as heavy traffic due to users, broken disks, etc. Could there be an email list/webpage for that information?
      2. clarification of how 'priority' of jobs works between different halls and users.
      3. would it be possible for the system to auto-resubmit failed jobs if the failure is on the side of the system (e.g., bad farm nodes, temporary loss of connection)?
    3. Additionally, ask for more space on cache disk?
  7. HDDM versions and backward compatibility
  8. Action Item Review

Communication Information

Remote Connection

Slides

Talks can be deposited in the directory /group/halld/www/halldweb1/html/talks/2015 on the JLab CUE. This directory is accessible from the web at https://halldweb1.jlab.org/talks/2015/ .