[Iplant-api-dev] Failed job but no error message

Darren Boss dboss at email.arizona.edu
Sun Nov 3 14:22:30 MST 2013


I'm also not able to query the accounting data for that particular job
on lonestar:

login1$ qacct -j 1537860
error: job id 1537860 not found

Looking at the blastout.1 file it does seem like the job was killed
before it finished.

On Sun, Nov 3, 2013 at 4:12 PM, Darren Boss <dboss at email.arizona.edu> wrote:
> Thank you. That helped out quite a bit.
>
> There are files listed in the output list that I do not have in irods,
> in fact I don't have any output files in that the archive directory at
> all. It looks like the job executed correctly by downloading
> https://foundation.iplantcollaborative.org/apps-v1/job/32545/output/lonestar/blastout.1.
> Why is the status failed and not archiving_failed? It seems like it
> ran without failure.
>
> On Sun, Nov 3, 2013 at 10:18 AM, Rion Dooley <dooley at tacc.utexas.edu> wrote:
>> Hey Darren,
>>
>> You can get the local id a couple different ways. During run time, the SGE
>> job id is given in the JSON job description as the "localJobID" field. You
>> can also get it from the *.out file in the work directory. For example, for
>> job 32545, you can list the output folder by calling:
>>
>> https://foundation.iplantcollaborative.org/apps-v1/job/32545/output/list/
>>
>> Which will tell you the contents of the work folder is another folder called
>> lonestar, so calling:
>>
>> https://foundation.iplantcollaborative.org/apps-v1/job/32545/output/list/lonestar
>>
>> will list a bunch of other generated files during execution. Browsing them
>> shows that you had an output file called
>> imicrobe-blast-2225-simap-32545.out. Downloading that file using the
>> following url shows the scheduler gave the local job id several times in the
>> output log.
>>
>> https://foundation.iplantcollaborative.org/apps-v1/job/32545/output/lonestar/imicrobe-blast-2225-simap-32545.out
>>
>> The first and last are shown below.
>>
>> TACC: Setting memory limits for job 1537860 to unlimited KB
>>
>> ...
>>
>> ...
>>
>> TACC: Cleaning up after job: 1537860
>> TACC: Done.
>>
>> let me know if that helps.
>>
>>
>> -
>> Rion
>>
>> On Nov 2, 2013, at 3:10 PM, "Darren Boss" <dboss at email.arizona.edu> wrote:
>>
>> There are about 20 or so failed jobs all with no message in the json
>> result. The job IDs of one run is from 32544-32551. Is there a way to
>> figure out what the sge id is in order to query on job on Lonestar
>> using qacct or can someone else do some investigation to find out why
>> they failed.
>>
>> This type of job was working when launched from a script running on my
>> computer but now I'm moving them over to a condor node and had to make
>> a few changes to the scripts.
>>
>> Just to be clear, the status of all jobs is FAILED but there is now
>> descriptive message about why they failed.
>> _______________________________________________
>> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
>> List Info and Archives:
>> http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
>> One-click Unsubscribe:
>> http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/dooley%40tacc.utexas.edu?unsub=1&unsubconfirm=1


More information about the Iplant-api-dev mailing list