[api-dev] Agave Job Status Codes
Matthew Vaughn
vaughn at tacc.utexas.edu
Wed Feb 3 13:46:24 MST 2016
This is great feedback. Can you tell me the job ID please? I’m trying to look at some other logs as part of writing an improvement request ticket to the Agave team.
On Feb 3, 2016, at 2:29 PM, Duvick, Jonathan P [GDCBS] <jduvick at iastate.edu<mailto:jduvick at iastate.edu>> wrote:
I recently ran a test job on Stampede that succeeded (that is, it deposited the expected output in my archive) but the process monitor included a job status 'FAILED' , following after 'STAGING_JOB' (see my webhook's output below at 08:50:23).
'FAILED' as I understand it means the job itself failed, which is in line with the documentation at http://agaveapi.co/documentation/tutorials/job-management-tutorial/ (Table 2, list of all possible job statuses);
hence I designed my code to act accordingly (stop waiting for output), but in this case that was not the correct action.
While I am grateful that the plucky API tried and tried again and finally succeeded [😊] , I wonder if the status codes need to be tweaked so they are more in line with their intended meaning?
More useful would be something like STAGING_FAILED or a global 'JOB_FAILED' so scripts could distinguish the two (one non-fatal to output, another fatal)
Thanks,
Jon Duvick
P.S. [I also notice an 'ERROR' status in the output below that is not in the official list of job statuses.]
SUBMITTED: 2016-02-03 08:50:01 | PENDING: 2016-02-03T09:50:01.205-06:00 | | PROCESSING_INPUTS: 2016-02-03 08:50:04 | Attempt 1 to stage job inputs | PROCESSING_INPUTS: 2016-02-03 08:50:04 | Identifying input files for staging | STAGING_INPUTS: 2016-02-03 08:50:06 | Copy in progress | STAGED: 2016-02-03 08:50:14 | Job inputs staged to execution system | SUBMITTING: 2016-02-03 08:50:19 | Preparing job for submission. | SUBMITTING: 2016-02-03 08:50:20 | Attempt 1 to submit job | STAGING_JOB: 2016-02-03 08:50:20 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:20 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:20 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:21 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:21 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:22 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:22 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:23 | Attempt 1 to submit job | STAGING_JOB: 2016-02-03 08:50:23 | Attempt 1 to submit job | FAILED: 2016-02-03 08:50:23 | Attempt 1 to submit job | ERROR: : 2016-02-03 08:50:23 | QUEUED: 2016-02-03 08:50:39 | Attempt 1 to submit job | RUNNING: 2016-02-03 08:50:50 | HPC job successfully placed into normal queue as local job 6502390 | CLEANING_UP: 2016-02-03 08:50:59 | Job started running | ARCHIVING: 2016-02-03 08:51:03 | Beginning to archive output. | ARCHIVING: 2016-02-03 08:51:03 | Attempt 1 to archive job output | ARCHIVING: 2016-02-03 08:51:05 | Attempt 1 to archive job output | ARCHIVING_FINISHED: 2016-02-03 08:51:21 | Attempt 1 to archive job output | FINISHED: 2016-02-03 08:51:22 | Job complete |
_______________________________________________
api-dev Mailing List: api-dev at maillist.cyverse.org<mailto:api-dev at maillist.cyverse.org>
List Info and Archives: http://maillist.cyverse.org/mailman/listinfo/api-dev
One-click Unsubscribe: http://maillist.cyverse.org/mailman/options/api-dev/vaughn%40tacc.utexas.edu?unsub=1&unsubconfirm=1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.cyverse.org/pipermail/api-dev/attachments/20160203/e6b74e53/attachment.html>
More information about the api-dev
mailing list