[Iplant-api-dev] Problems submitting fAPI jobs

Jennewein, Douglas M Doug.Jennewein at usd.edu
Wed Aug 20 07:40:05 MST 2014


Thanks, Rion.  The job staging failures seem to have been intermittent and stopped on August 13, but we've started noticing a different problem.

Jobs submitted both by djennewe and bioextract (including scheduled retries) have been stuck in the PENDING state since about 2:00 PM yesterday afternoon.


From: Rion Dooley [mailto:dooley at tacc.utexas.edu]
Sent: Tuesday, August 12, 2014 3:31 PM
To: Jennewein, Douglas M
Cc: iPlant API Developers Mailing List
Subject: Re: [Iplant-api-dev] Problems submitting fAPI jobs

Hi Doug,

It's really important that, when building your apps, you have a mechanism for retrying failed jobs. Often times, the underlying systems and networks will do weird things. That's nothing Foundation can predict or avoid. While it does retry several times on it's own, if a head node goes down, gets overloaded, or the network just does weird things while it's retrying, the job will fail. It's up to you, as a developer, to retry jobs that inexplicably fail as part of your normal submission workflow.

On another note, I've noticed that a lot of appellations are submitting "test" jobs several times a hour. Not only is this a horrible use of public resources, it's also ineffective because they don't actually test anything other than whether the API is responsive, which the past few months has been well over 99% of the time. It would be better to ping the system directly (not recommended), or use Agave's monitoring service<http://agaveapi.co/live-docs/#!/monitors> to make this check for you. If you are worried about the actual API status, then you can subscribe for notifications from the Agave Status page, hosted by status.io<http://status.io> at  http://status.agaveapi.co. The status page is guaranteed 100% uptime and accurately reflects current system statuses.

--
Rion



On Aug 12, 2014, at 3:15 PM, Jennewein, Douglas M <Doug.Jennewein at usd.edu<mailto:Doug.Jennewein at usd.edu>> wrote:


I'm noticing problems submitting jobs to fAPI as djennewe (job id 57821).  Job submission seems to fail with a Java io exception message like "Failed to submit job 57821: java.io.IOException: Error completing remote execution after: "  Jobs had been working normally earlier today.

Doug

_______________________________________________
Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org<mailto:Iplant-api-dev at iplantcollaborative.org>
List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/dooley%40tacc.utexas.edu?unsub=1&unsubconfirm=1

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.iplantcollaborative.org/pipermail/iplant-api-dev/attachments/20140820/c8b09c54/attachment.html 


More information about the Iplant-api-dev mailing list