[api-dev] Agave Job Status Codes

Barthelson, Roger A - (rogerab) rogerab at email.arizona.edu
Wed Feb 3 13:40:12 MST 2016


Occasionally, I do see a job failure that doesn’t seem to be a failure, but more often it is the opposite problem. The job effectively fails, but completes successfully.

Roger

--
Roger Barthelson Ph.D.

Scientific Analyst

CyVerse

rogerab at cyverse.org

TEL:+1-520-977-5249

http://www.cyverse.org<http://www.iplantcollaborative.org/>



On February 3, 2016 at 1:29:35 PM, Duvick, Jonathan P [GDCBS] (jduvick at iastate.edu<mailto:jduvick at iastate.edu>) wrote:


I recently ran a test job on Stampede that succeeded (that is, it deposited the expected output in my archive) but the process monitor included a job status 'FAILED' , following after 'STAGING_JOB' (see my webhook's output below at 08:50:23).
'FAILED' as I understand it means the job itself failed, which is in line with the documentation at http://agaveapi.co/documentation/tutorials/job-management-tutorial/  (Table 2, list of all possible job statuses);
hence I designed my code to act accordingly (stop waiting for output), but in this case that was not the correct action.

While I am grateful that the plucky API tried and tried again and finally succeeded [&#X1f60a]    , I wonder if the status codes need to be tweaked so they are more in line with their intended meaning?
More useful would be something like STAGING_FAILED or a global 'JOB_FAILED' so scripts could distinguish the two (one non-fatal to output, another fatal)

Thanks,
Jon Duvick

P.S. [I also notice an 'ERROR' status in the output below that is not in the official list of job statuses.]

SUBMITTED: 2016-02-03 08:50:01 | PENDING: 2016-02-03T09:50:01.205-06:00 |  | PROCESSING_INPUTS: 2016-02-03 08:50:04 | Attempt 1 to stage job inputs | PROCESSING_INPUTS: 2016-02-03 08:50:04 | Identifying input files for staging | STAGING_INPUTS: 2016-02-03 08:50:06 | Copy in progress | STAGED: 2016-02-03 08:50:14 | Job inputs staged to execution system | SUBMITTING: 2016-02-03 08:50:19 | Preparing job for submission. | SUBMITTING: 2016-02-03 08:50:20 | Attempt 1 to submit job | STAGING_JOB: 2016-02-03 08:50:20 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:20 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:20 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:21 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:21 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:22 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:22 | Attempt 1 to submit job | SUBMITTING: 2016-02-03 08:50:23 | Attempt 1 to submit job | STAGING_JOB: 2016-02-03 08:50:23 | Attempt 1 to submit job | FAILED: 2016-02-03 08:50:23 | Attempt 1 to submit job | ERROR: : 2016-02-03 08:50:23 | QUEUED: 2016-02-03 08:50:39 | Attempt 1 to submit job | RUNNING: 2016-02-03 08:50:50 | HPC job successfully placed into normal queue as local job 6502390 | CLEANING_UP: 2016-02-03 08:50:59 | Job started running | ARCHIVING: 2016-02-03 08:51:03 | Beginning to archive output. | ARCHIVING: 2016-02-03 08:51:03 | Attempt 1 to archive job output | ARCHIVING: 2016-02-03 08:51:05 | Attempt 1 to archive job output | ARCHIVING_FINISHED: 2016-02-03 08:51:21 | Attempt 1 to archive job output | FINISHED: 2016-02-03 08:51:22 | Job complete |
_______________________________________________
api-dev Mailing List: api-dev at maillist.cyverse.org
List Info and Archives: http://maillist.cyverse.org/mailman/listinfo/api-dev
One-click Unsubscribe: http://maillist.cyverse.org/mailman/options/api-dev/rogerab%40cyverse.org?unsub=1&unsubconfirm=1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.cyverse.org/pipermail/api-dev/attachments/20160203/638e0096/attachment-0001.html>


More information about the api-dev mailing list