[api-dev] Agave Job Status Codes
Duvick, Jonathan P [GDCBS]
jduvick at iastate.edu
Wed Feb 3 13:29:18 MST 2016
I recently ran a test job on Stampede that succeeded (that is, it deposited the expected output in my archive) but the process monitor included a job status 'FAILED' , following after 'STAGING_JOB' (see my webhook's output below at 08:50:23).
'FAILED' as I understand it means the job itself failed, which is in line with the documentation at http://agaveapi.co/documentation/tutorials/job-management-tutorial/ (Table 2, list of all possible job statuses);
hence I designed my code to act accordingly (stop waiting for output), but in this case that was not the correct action.
While I am grateful that the plucky API tried and tried again and finally succeeded [😊] , I wonder if the status codes need to be tweaked so they are more in line with their intended meaning?
More useful would be something like STAGING_FAILED or a global 'JOB_FAILED' so scripts could distinguish the two (one non-fatal to output, another fatal)
Thanks,
Jon Duvick
P.S. [I also notice an 'ERROR' status in the output below that is not in the official list of job statuses.]
SUBMITTED: 2016-02-03 08:50:01 | PENDING: 2016-02-03T09:50:01.205-06:00 | | PROCESSING_INPUTS: 2016-02-03 08:50:04 | Attempt 1 to stage job inputs | PROCESSING_INPUTS: 2016-02-03 08:50:04 | Identifying input files for staging | STAGING_INPUTS: 2016-02-03 08:50:06 | Copy in progress | STAGED: 2016-02-03 08:50:14 | Job inputs staged to execution system | SUBMITTING: 2016-02-03 08:50:19 | Preparing job for submission. | SUBMITTING: 2016-02-03 08:50:20 | Attempt 1 to submit job | STAGING_JOB: 2016-02-03 08:50:20 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:20 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:20 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:21 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:21 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:22 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:22 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:23 | Attempt 1 to submit job | STAGING_JOB: 2016-02-03 08:50:23 | Attempt 1 to submit job | FAILED: 2016-02-03 08:50:23 | Attempt 1 to submit job | ERROR: : 2016-02-03 08:50:23 | QUEUED: 2016-02-03 08:50:39 | Attempt 1 to submit job | RUNNING: 2016-02-03 08:50:50 | HPC job successfully placed into normal queue as local job 6502390 | CLEANING_UP: 2016-02-03 08:50:59 | Job started running | ARCHIVING: 2016-02-03 08:51:03 | Beginning to archive output. | ARCHIVING: 2016-02-03 08:51:03 | Attempt 1 to archive job output | ARCHIVING: 2016-02-03 08:51:05 | Attempt 1 to archive job output | ARCHIVING_FINISHED: 2016-02-03 08:51:21 | Attempt 1 to archive job output | FINISHED: 2016-02-03 08:51:22 | Job complete |
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.cyverse.org/pipermail/api-dev/attachments/20160203/8e030eee/attachment.html>
More information about the api-dev
mailing list