[Iplant-api-dev] Failed job but no error message
Darren Boss
dboss at email.arizona.edu
Mon Nov 4 08:47:40 MST 2013
I think what's happening is my foundation api authentication is timing
out so I'm no longer to monitor the jobs. The condor job has finished
but the foundation jobs are still running. Should be an easy fix in my
script.
On Sun, Nov 3, 2013 at 11:30 PM, Rion Dooley <dooley at tacc.utexas.edu> wrote:
> The job probably ran out of time before total completion. His happened more often in v1. Try bumping the requested time. I'm about to hop on a flight back from Athens at the moment, but I can look at this late Monday when I land in austin again.
>
> -
> Rion
>
>> On Nov 3, 2013, at 11:22 PM, "Darren Boss" <dboss at email.arizona.edu> wrote:
>>
>> I'm also not able to query the accounting data for that particular job
>> on lonestar:
>>
>> login1$ qacct -j 1537860
>> error: job id 1537860 not found
>>
>> Looking at the blastout.1 file it does seem like the job was killed
>> before it finished.
>>
>>> On Sun, Nov 3, 2013 at 4:12 PM, Darren Boss <dboss at email.arizona.edu> wrote:
>>> Thank you. That helped out quite a bit.
>>>
>>> There are files listed in the output list that I do not have in irods,
>>> in fact I don't have any output files in that the archive directory at
>>> all. It looks like the job executed correctly by downloading
>>> https://foundation.iplantcollaborative.org/apps-v1/job/32545/output/lonestar/blastout.1.
>>> Why is the status failed and not archiving_failed? It seems like it
>>> ran without failure.
>>>
>>>> On Sun, Nov 3, 2013 at 10:18 AM, Rion Dooley <dooley at tacc.utexas.edu> wrote:
>>>> Hey Darren,
>>>>
>>>> You can get the local id a couple different ways. During run time, the SGE
>>>> job id is given in the JSON job description as the "localJobID" field. You
>>>> can also get it from the *.out file in the work directory. For example, for
>>>> job 32545, you can list the output folder by calling:
>>>>
>>>> https://foundation.iplantcollaborative.org/apps-v1/job/32545/output/list/
>>>>
>>>> Which will tell you the contents of the work folder is another folder called
>>>> lonestar, so calling:
>>>>
>>>> https://foundation.iplantcollaborative.org/apps-v1/job/32545/output/list/lonestar
>>>>
>>>> will list a bunch of other generated files during execution. Browsing them
>>>> shows that you had an output file called
>>>> imicrobe-blast-2225-simap-32545.out. Downloading that file using the
>>>> following url shows the scheduler gave the local job id several times in the
>>>> output log.
>>>>
>>>> https://foundation.iplantcollaborative.org/apps-v1/job/32545/output/lonestar/imicrobe-blast-2225-simap-32545.out
>>>>
>>>> The first and last are shown below.
>>>>
>>>> TACC: Setting memory limits for job 1537860 to unlimited KB
>>>>
>>>> ...
>>>>
>>>> ...
>>>>
>>>> TACC: Cleaning up after job: 1537860
>>>> TACC: Done.
>>>>
>>>> let me know if that helps.
>>>>
>>>>
>>>> -
>>>> Rion
>>>>
>>>> On Nov 2, 2013, at 3:10 PM, "Darren Boss" <dboss at email.arizona.edu> wrote:
>>>>
>>>> There are about 20 or so failed jobs all with no message in the json
>>>> result. The job IDs of one run is from 32544-32551. Is there a way to
>>>> figure out what the sge id is in order to query on job on Lonestar
>>>> using qacct or can someone else do some investigation to find out why
>>>> they failed.
>>>>
>>>> This type of job was working when launched from a script running on my
>>>> computer but now I'm moving them over to a condor node and had to make
>>>> a few changes to the scripts.
>>>>
>>>> Just to be clear, the status of all jobs is FAILED but there is now
>>>> descriptive message about why they failed.
>>>> _______________________________________________
>>>> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
>>>> List Info and Archives:
>>>> http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
>>>> One-click Unsubscribe:
>>>> http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/dooley%40tacc.utexas.edu?unsub=1&unsubconfirm=1
More information about the Iplant-api-dev
mailing list