Farm Job Tracking Database

From GlueXWiki
Jump to: navigation, search

Main Table

Listing of all runs/files that are in the plan and their status.

  • script uses information to do job submission and status checking
  • re-submission done by marking run/file as not submitted
  • same system as used for DC1

Description

mysql> describe dc_02;
+------------------+--------------+------+-----+-------------------+-----------------------------+
| Field            | Type         | Null | Key | Default           | Extra                       |
+------------------+--------------+------+-----+-------------------+-----------------------------+
| run              | int(11)      | NO   | PRI | 0                 |                             |
| file             | mediumint(9) | NO   | PRI | 0                 |                             |
| submitted        | tinyint(4)   | NO   |     | 0                 |                             |
| output           | tinyint(4)   | NO   |     | 0                 |                             |
| jput_submitted   | tinyint(4)   | NO   |     | 0                 |                             |
| silo             | tinyint(4)   | NO   |     | 0                 |                             |
| jcache_submitted | tinyint(4)   | NO   |     | 0                 |                             |
| cache            | tinyint(4)   | NO   |     | 0                 |                             |
| mod_time         | timestamp    | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+------------------+--------------+------+-----+-------------------+-----------------------------+
9 rows in set (0.00 sec)

Example

mysql>  select run, file, submitted, output, jput_submitted, silo, mod_time from dc_02 limit 10;
+------+---------+-----------+--------+----------------+------+---------------------+
| run  | file    | submitted | output | jput_submitted | silo | mod_time            |
+------+---------+-----------+--------+----------------+------+---------------------+
| 9001 | 2000019 |         1 |      0 |              0 |    0 | 2014-03-22 02:23:32 |
| 9001 | 2000065 |         1 |      0 |              0 |    0 | 2014-03-22 02:25:02 |
| 9001 | 2000062 |         1 |      0 |              0 |    0 | 2014-03-22 02:24:56 |
| 9001 | 2000010 |         1 |      0 |              0 |    0 | 2014-03-22 02:23:14 |
| 9001 | 2000017 |         1 |      0 |              0 |    0 | 2014-03-22 02:23:28 |
| 9001 | 2000022 |         1 |      0 |              0 |    0 | 2014-03-22 02:23:38 |
| 9001 | 2000059 |         1 |      0 |              0 |    0 | 2014-03-22 02:24:50 |
| 9001 | 2000088 |         1 |      0 |              0 |    0 | 2014-03-22 02:25:47 |
| 9001 | 2000025 |         1 |      0 |              0 |    0 | 2014-03-22 02:23:43 |
| 9001 | 2000057 |         1 |      0 |              0 |    0 | 2014-03-22 02:24:46 |
+------+---------+-----------+--------+----------------+------+---------------------+
10 rows in set (0.00 sec)

Job Table

Listing of all jobs submitted.

  • information captured from JLab batch farm system (Auger) database via JSON web service
    • previously:
      • resource usage from standard output file
      • times from "jobstat" command, disappeared after job finished
    • Job ID itself captured at submit time
  • a particular run/file may have more than one job if it had to be resubmitted

Description

mysql> describe dc_02Job;
+-----------------+---------------+------+-----+-------------------+------------------------ -----+
| Field           | Type          | Null | Key | Default           | Extra                        |
+-----------------+---------------+------+-----+-------------------+------------------------ -----+
| id              | int(11)       | NO   | PRI | NULL              | auto_increment               |
| run             | int(11)       | YES  |     | NULL              |                              |
| file            | int(11)       | YES  |     | NULL              |                              |
| jobId           | int(11)       | YES  |     | NULL              |                              |
| timeChange      | timestamp     | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMES TAMP |
| username        | varchar(64)   | YES  |     | NULL              |                              |
| project         | varchar(64)   | YES  |     | NULL              |                              |
| name            | varchar(64)   | YES  |     | NULL              |                              |
| queue           | varchar(64)   | YES  |     | NULL              |                              |
| hostname        | varchar(64)   | YES  |     | NULL              |                              |
| nodeTags        | varchar(64)   | YES  |     | NULL              |                              |
| coresRequested  | int(11)       | YES  |     | NULL              |                              |
| memoryRequested | int(11)       | YES  |     | NULL              |                              |
| status          | varchar(64)   | YES  |     | NULL              |                              |
| exitCode        | int(11)       | YES  |     | NULL              |                              |
| result          | varchar(64)   | YES  |     | NULL              |                              |
| timeSubmitted   | datetime      | YES  |     | NULL              |                              |
| timeDependency  | datetime      | YES  |     | NULL              |                              |
| timePending     | datetime      | YES  |     | NULL              |                              |
| timeStagingIn   | datetime      | YES  |     | NULL              |                              |
| timeActive      | datetime      | YES  |     | NULL              |                              |
| timeStagingOut  | datetime      | YES  |     | NULL              |                              |
| timeComplete    | datetime      | YES  |     | NULL              |                              |
| walltime        | varchar(8)    | YES  |     | NULL              |                              |
| cput            | varchar(8)    | YES  |     | NULL              |                              |
| mem             | varchar(64)   | YES  |     | NULL              |                              |
| vmem            | varchar(64)   | YES  |     | NULL              |                              |
| script          | varchar(1024) | YES  |     | NULL              |                              |
| files           | varchar(1024) | YES  |     | NULL              |                              |
| error           | varchar(1024) | YES  |     | NULL              |                              |
+-----------------+---------------+------+-----+-------------------+------------------------ -----+
30 rows in set (0.00 sec)  

Examples

Completed Jobs

mysql> select run, file, jobId, hostname, status, result, timeSubmitted, timeActive, timeComplete, cput, mem, vmem from dc_02Job limit 10;
+------+---------+---------+------------+--------+---------+---------------------+---------------------+---------------------+----------+----------+---------- -+
| run  | file    | jobId   | hostname   | status | result  | timeSubmitted       | timeActive          | timeComplete        | cput     | mem      | vmem      |
+------+---------+---------+------------+--------+---------+---------------------+---------------------+---------------------+----------+----------+-----------+
| 9002 | 2001429 | 6355302 | qcd12s0220 | DONE   | SUCCESS | 2014-03-24 11:55:20 | 2014-03-24 12:34:39 | 2014-03-25 13:44:18 | 25:10:17 | 698604kb | 1016384kb |
| 9001 | 2004958 | 6372009 | qcd12s0423 | DONE   | SUCCESS | 2014-03-24 18:37:13 | 2014-03-26 17:17:36 | 2014-03-27 14:14:14 | 20:57:50 | 721724kb | 1081992kb |
| 9003 | 2000891 | 6332786 | farm10016  | DONE   | SUCCESS | 2014-03-23 13:54:58 | 2014-03-24 10:23:56 | 2014-03-25 08:53:01 | 22:35:26 | 810652kb | 1147524kb |
| 9002 | 2001722 | 6357568 | farm09021  | DONE   | SUCCESS | 2014-03-24 13:17:32 | 2014-03-25 12:31:28 | 2014-03-26 10:50:00 | 22:16:05 | 700876kb | 1016460kb |
| 9001 | 2004651 | 6371701 | farm10017  | DONE   | SUCCESS | 2014-03-24 18:26:46 | 2014-03-26 17:08:22 | 2014-03-27 11:16:58 | 18:08:36 | 729572kb | 1083076kb |
| 9002 | 2000415 | 6306483 | farm09022  | DONE   | SUCCESS | 2014-03-22 15:14:09 | 2014-03-22 15:45:15 | 2014-03-23 15:12:51 | 23:23:01 | 773828kb | 1081984kb |
| 9002 | 2001974 | 6357824 | qcd12s0727 | DONE   | SUCCESS | 2014-03-24 13:26:17 | 2014-03-25 12:38:18 | 2014-03-26 13:58:53 | 25:20:49 | 751380kb | 1081984kb |
| 9001 | 2000692 | 6330148 | farm13014  | DONE   | SUCCESS | 2014-03-23 12:22:03 | 2014-03-23 12:43:30 | 2014-03-24 02:14:33 | 13:31:49 | 858756kb | 1213052kb |
| 9001 | 2000423 | 6305951 | farm11015  | DONE   | SUCCESS | 2014-03-22 14:47:11 | 2014-03-22 15:00:19 | 2014-03-23 08:17:58 | 17:17:39 | 795508kb | 1151836kb |
| 9003 | 2000956 | 6332891 | farm13019  | DONE   | SUCCESS | 2014-03-23 13:57:14 | 2014-03-24 10:28:13 | 2014-03-25 03:20:37 | 16:58:54 | 806416kb | 1147520kb |
+------+---------+---------+------------+--------+---------+---------------------+---------------------+---------------------+----------+----------+-----------+
10 rows in set (0.00 sec)

Running Jobs

mysql> select run, file, jobId, hostname, status, result, timeSubmitted, timeActive, timeComplete, cput, mem, vmem from dc_02Job where status = 'active'limit 10;
+------+---------+---------+------------+--------+--------+---------------------+---------------------+--------------+------+------+------+
| run  | file    | jobId   | hostname   | status | result | timeSubmitted       | timeActive          | timeComplete | cput | mem  | vmem |
+------+---------+---------+------------+--------+--------+---------------------+---------------------+--------------+------+------+------+
| 9001 | 2007001 | 6418792 | farm12009  | ACTIVE | NULL   | 2014-03-26 23:17:11 | 2014-03-27 18:50:16 | NULL         | NULL | NULL | NULL |
| 9001 | 2007002 | 6418793 | farm10015  | ACTIVE | NULL   | 2014-03-26 23:18:22 | 2014-03-27 18:50:17 | NULL         | NULL | NULL | NULL |
| 9001 | 2007003 | 6418794 | farm09019  | ACTIVE | NULL   | 2014-03-26 23:18:24 | 2014-03-27 18:50:17 | NULL         | NULL | NULL | NULL |
| 9001 | 2007004 | 6418795 | qcd12s0707 | ACTIVE | NULL   | 2014-03-26 23:18:26 | 2014-03-27 18:50:25 | NULL         | NULL | NULL | NULL |
| 9001 | 2007005 | 6418796 | farm13023  | ACTIVE | NULL   | 2014-03-26 23:18:28 | 2014-03-27 18:50:27 | NULL         | NULL | NULL | NULL |
| 9001 | 2007006 | 6418797 | farm13002  | ACTIVE | NULL   | 2014-03-26 23:18:30 | 2014-03-27 18:50:26 | NULL         | NULL | NULL | NULL |
| 9001 | 2007007 | 6418798 | farm12002  | ACTIVE | NULL   | 2014-03-26 23:18:32 | 2014-03-27 18:50:27 | NULL         | NULL | NULL | NULL |
| 9001 | 2007008 | 6418799 | farm10024  | ACTIVE | NULL   | 2014-03-26 23:18:34 | 2014-03-27 18:50:26 | NULL         | NULL | NULL | NULL |
| 9001 | 2007009 | 6418800 | farm10023  | ACTIVE | NULL   | 2014-03-26 23:18:37 | 2014-03-27 18:50:28 | NULL         | NULL | NULL | NULL |
| 9001 | 2007010 | 6418801 | farm10016  | ACTIVE | NULL   | 2014-03-26 23:18:39 | 2014-03-27 18:50:26 | NULL         | NULL | NULL | NULL |
+------+---------+---------+------------+--------+--------+---------------------+---------------------+--------------+------+------+------+
10 rows in set (0.02 sec)

Jobs in the Queue

mysql> select run, file, jobId, hostname, status, result, timeSubmitted, timeActive, timeComplete, cput, mem, vmem from dc_02Job where status = 'pending' limit 10;
+------+---------+---------+----------+---------+--------+---------------------+------------+--------------+------+------+------+
| run  | file    | jobId   | hostname | status  | result | timeSubmitted       | timeActive | timeComplete | cput | mem  | vmem |
+------+---------+---------+----------+---------+--------+---------------------+------------+--------------+------+------+------+
| 9001 | 2007769 | 6419560 | NULL     | PENDING | NULL   | 2014-03-26 23:44:54 | NULL       | NULL         | NULL | NULL | NULL |
| 9001 | 2007770 | 6419561 | NULL     | PENDING | NULL   | 2014-03-26 23:44:56 | NULL       | NULL         | NULL | NULL | NULL |
| 9001 | 2007771 | 6419562 | NULL     | PENDING | NULL   | 2014-03-26 23:44:59 | NULL       | NULL         | NULL | NULL | NULL |
| 9001 | 2007772 | 6419563 | NULL     | PENDING | NULL   | 2014-03-26 23:45:01 | NULL       | NULL         | NULL | NULL | NULL |
| 9001 | 2007773 | 6419564 | NULL     | PENDING | NULL   | 2014-03-26 23:45:03 | NULL       | NULL         | NULL | NULL | NULL |
| 9001 | 2007774 | 6419565 | NULL     | PENDING | NULL   | 2014-03-26 23:45:05 | NULL       | NULL         | NULL | NULL | NULL |
| 9001 | 2007775 | 6419566 | NULL     | PENDING | NULL   | 2014-03-26 23:45:07 | NULL       | NULL         | NULL | NULL | NULL |
| 9001 | 2007776 | 6419567 | NULL     | PENDING | NULL   | 2014-03-26 23:45:09 | NULL       | NULL         | NULL | NULL | NULL |
| 9001 | 2007777 | 6419568 | NULL     | PENDING | NULL   | 2014-03-26 23:45:11 | NULL       | NULL         | NULL | NULL | NULL |
| 9001 | 2007778 | 6419569 | NULL     | PENDING | NULL   | 2014-03-26 23:45:13 | NULL       | NULL         | NULL | NULL | NULL |
+------+---------+---------+----------+---------+--------+---------------------+------------+--------------+------+------+------+
10 rows in set (0.02 sec)