Raid-to-Silo Transfer Strategy

From GlueXWiki
Revision as of 16:33, 24 October 2013 by Wolin (Talk | contribs)

Jump to: navigation, search

Below is a proposal for a raid-to-silo transfer strategy for moving Hall D data files from our local raid server to the JLab tape storage facility. We will update this as our ideas develop.

Elliott Wolin
Dave Lawrence
24-Oct-2013


Notes

  • We will use the jmirror facility from the Computer Center to transfer the files.
  • jmirror deletes the link to the file when the transfer is complete. It does not delete directories, only files.
  • jmirror is fairly smart and reliable. It only deletes the hard link when the file is safely transferred.
  • CRON jobs will delete unneeded dirs after their contents are safely transferred.
  • jmirror is run periodically via a CRON job, it is not a tranfer server system. It transfers files it finds when it is run.
  • jmirror will not transfer files actively being written to, nor transfer files twice if invoked twice.
  • Additional hard links to the data file are untouched by jmirror. These can be used to keep the file on disk after transfer.
  • If files are kept they will be deleted "just-in-time" to make room for new DAQ files. This will require cleanup strategy and cron scripts to implement it.
  • The DAQ creates a 10 GB file every 30 secs, about 1 TB/hour. Thus a two hour run generates 2 TB.
  • It is preferable to transfer files as they are ready for transfer, and not wait for the run to end before initiating transfer.
  • The simplest way to implement immediate transfer is for run control to run a script every time the ER closes a file.
  • Vardan and Carl are working out a simple scheme to allow users to specify such a script and have it run when a file is closed.
  • Mark I prefers to store files by "run period" with a simple naming scheme (RunPeriod001, RunPeriod002 or similar).
  • Run periods are just date ranges. Run numbers will NOT be reused, i.e. all run numbers are unique across all run periods.
  • Due to constraints in the mss a second level of directories is needed. Mark and I propose simply organizing files by run, e.g. something like Run000001, Run000002, etc.
  • Run files will have the run number in them, e.g something like: Run000001.evio.001, Run000001.evio.002, etc.
  • A two-hour run will generate around 250 files.
  • The RAID sytem stripes data across all disks, independent of logical partitioning.
  • RAID disk partitions do not seem to be needed (see below), they can be implemented later if necessary.
  • mv and ln cannot create hard links across partitions, files have to be physically copied to put them on a different partition.
  • The raid server must simultaneously read and write at 300 MB/s, it's best to avoid additional file copying.
  • Note that we have two completely independent RAID servers, 75 TB each.


Notes for Dec 2013 Online Data Challenge

  • We plan to use a basic autmomated file transfer mechanism in Dec that deletes files on transfer. If someone has the time we'll try just-in-time deletion.


Proposal