[Iplant-api-dev] Jobs intermittently failing at archiving step

Rion Dooley dooley at tacc.utexas.edu
Fri Aug 9 09:00:45 MST 2013


Hey Roger,

The max queue time for Lonestar is 24 hours. Ditto for Stampede. You can exclude files from being archived by adding any files you want to ignore to the .archive file in your job directory. The .archive file is created as part of your job setup process and the API will consult that prior to archiving. Doing a simple

$> echo output/some_file_to_ignore.png >> .archive

will append the file path to the end of the .archive file and tell the API not to move that to your archive folder. All paths should be relative to the job directory (which is also `pwd` when the job is running).

Rion

ps - This is true in v1 and in v2.
________________________________
From: iplant-api-dev-bounces at iplantcollaborative.org [iplant-api-dev-bounces at iplantcollaborative.org] on behalf of Barthelson, Roger A - (rogerab) [rogerab at email.arizona.edu]
Sent: Friday, August 09, 2013 10:35 AM
To: Rion Dooley
Cc: Discussion of iPlant API development
Subject: Re: [Iplant-api-dev] Jobs intermittently failing at archiving step

Jobs do seem to fail sometimes during archiving. Sometimes they will take too long to archive and time out during transfer. I believe Internet transfer problems can contribute to that. I'm not an expert at all on this side of things, but I think we can probably limit this sort of problem sometimes by how we design our Apps. One thing is to remove any large intermediate files that will not be needed by the user. Sometimes you do need these files, and what user's may want is hard to guess sometimes -- in which case you can make it an option. But otherwise, I don't have any great ideas that can help this. There can be anomalies that slow or break transfer, and sometimes it seems like an application is running normally, but you don 't get a final output. In recent experience I've seen some applications just not be able to run successfully through all stages because of some limitation in the data set. I have also seen assemblers go through a lot  of processing, but have no output -- careful inspection showed they just ran out of memory. So what you describe can have multiple causes, including possible transfer problems.

Roger


Sent from my iPad

On Aug 8, 2013, at 1:33 PM, "Jennewein, Douglas M" <Doug.Jennewein at usd.edu<mailto:Doug.Jennewein at usd.edu>> wrote:

We’ve had several jobs run normally, reach the Archiving step, and then occasionally fail, according to the Test Application.

However, the Failed job’s output is visible, for example: /apps-v1/job/19402/output/list, and indicates a successful run.  There are no obvious error messages.

Could we be doing something on our end that (sometimes) causes job archival to fail for otherwise successful runs?

Doug Jennewein
Research Computing Manager
Information Technology Services
The University of South Dakota
605.658.6068

_______________________________________________
Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org<mailto:Iplant-api-dev at iplantcollaborative.org>
List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/rogerab%40email.arizona.edu?unsub=1&unsubconfirm=1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.iplantcollaborative.org/pipermail/iplant-api-dev/attachments/20130809/14be4cde/attachment.html 


More information about the Iplant-api-dev mailing list