[Iplant-api-dev] Failed job but no error message
Darren Boss
dboss at email.arizona.edu
Sun Nov 3 14:12:24 MST 2013
Thank you. That helped out quite a bit.
There are files listed in the output list that I do not have in irods,
in fact I don't have any output files in that the archive directory at
all. It looks like the job executed correctly by downloading
https://foundation.iplantcollaborative.org/apps-v1/job/32545/output/lonestar/blastout.1.
Why is the status failed and not archiving_failed? It seems like it
ran without failure.
On Sun, Nov 3, 2013 at 10:18 AM, Rion Dooley <dooley at tacc.utexas.edu> wrote:
> Hey Darren,
>
> You can get the local id a couple different ways. During run time, the SGE
> job id is given in the JSON job description as the "localJobID" field. You
> can also get it from the *.out file in the work directory. For example, for
> job 32545, you can list the output folder by calling:
>
> https://foundation.iplantcollaborative.org/apps-v1/job/32545/output/list/
>
> Which will tell you the contents of the work folder is another folder called
> lonestar, so calling:
>
> https://foundation.iplantcollaborative.org/apps-v1/job/32545/output/list/lonestar
>
> will list a bunch of other generated files during execution. Browsing them
> shows that you had an output file called
> imicrobe-blast-2225-simap-32545.out. Downloading that file using the
> following url shows the scheduler gave the local job id several times in the
> output log.
>
> https://foundation.iplantcollaborative.org/apps-v1/job/32545/output/lonestar/imicrobe-blast-2225-simap-32545.out
>
> The first and last are shown below.
>
> TACC: Setting memory limits for job 1537860 to unlimited KB
>
> ...
>
> ...
>
> TACC: Cleaning up after job: 1537860
> TACC: Done.
>
> let me know if that helps.
>
>
> -
> Rion
>
> On Nov 2, 2013, at 3:10 PM, "Darren Boss" <dboss at email.arizona.edu> wrote:
>
> There are about 20 or so failed jobs all with no message in the json
> result. The job IDs of one run is from 32544-32551. Is there a way to
> figure out what the sge id is in order to query on job on Lonestar
> using qacct or can someone else do some investigation to find out why
> they failed.
>
> This type of job was working when launched from a script running on my
> computer but now I'm moving them over to a condor node and had to make
> a few changes to the scripts.
>
> Just to be clear, the status of all jobs is FAILED but there is now
> descriptive message about why they failed.
> _______________________________________________
> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> List Info and Archives:
> http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
> One-click Unsubscribe:
> http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/dooley%40tacc.utexas.edu?unsub=1&unsubconfirm=1
More information about the Iplant-api-dev
mailing list