[Iplant-api-dev] Problems submitting fAPI jobs
Rion Dooley
dooley at tacc.utexas.edu
Wed Aug 20 08:59:43 MST 2014
Hi Cornel,
Jobs are running again.
--
Rion
On Aug 20, 2014, at 10:13 AM, Ghiban, Cornel <ghiban at cshl.edu> wrote:
> Yes, I can confirm this. All jobs since yesterday after 3pm are in
> PENDING state.
>
> Thanks,
> Cornel
>
> On Wed, 2014-08-20 at 09:40 -0500, Jennewein, Douglas M wrote:
>> Thanks, Rion. The job staging failures seem to have been intermittent
>> and stopped on August 13, but we’ve started noticing a different
>> problem.
>>
>>
>>
>> Jobs submitted both by djennewe and bioextract (including scheduled
>> retries) have been stuck in the PENDING state since about 2:00 PM
>> yesterday afternoon.
>>
>>
>>
>>
>>
>> From: Rion Dooley [mailto:dooley at tacc.utexas.edu]
>> Sent: Tuesday, August 12, 2014 3:31 PM
>> To: Jennewein, Douglas M
>> Cc: iPlant API Developers Mailing List
>> Subject: Re: [Iplant-api-dev] Problems submitting fAPI jobs
>>
>>
>>
>>
>> Hi Doug,
>>
>>
>>
>>
>> It’s really important that, when building your apps, you have a
>> mechanism for retrying failed jobs. Often times, the underlying
>> systems and networks will do weird things. That’s nothing Foundation
>> can predict or avoid. While it does retry several times on it’s own,
>> if a head node goes down, gets overloaded, or the network just does
>> weird things while it’s retrying, the job will fail. It’s up to you,
>> as a developer, to retry jobs that inexplicably fail as part of your
>> normal submission workflow.
>>
>>
>>
>>
>>
>> On another note, I’ve noticed that a lot of appellations are
>> submitting “test” jobs several times a hour. Not only is this a
>> horrible use of public resources, it’s also ineffective because they
>> don’t actually test anything other than whether the API is responsive,
>> which the past few months has been well over 99% of the time. It would
>> be better to ping the system directly (not recommended), or
>> use Agave’s monitoring service to make this check for you. If you are
>> worried about the actual API status, then you can subscribe for
>> notifications from the Agave Status page, hosted by status.io at
>> http://status.agaveapi.co. The status page is guaranteed 100% uptime
>> and accurately reflects current system statuses.
>>
>>
>> --
>> Rion
>>
>>
>>
>>
>>
>>
>>
>> On Aug 12, 2014, at 3:15 PM, Jennewein, Douglas M
>> <Doug.Jennewein at usd.edu> wrote:
>>
>>
>>
>>
>> I'm noticing problems submitting jobs to fAPI as djennewe (job
>> id 57821). Job submission seems to fail with a Java io
>> exception message like "Failed to submit job 57821:
>> java.io.IOException: Error completing remote execution after:
>> " Jobs had been working normally earlier today.
>>
>>
>>
>>
>>
>> Doug
>>
>>
>>
>>
>>
>> _______________________________________________
>> Iplant-api-dev Mailing
>> List: Iplant-api-dev at iplantcollaborative.org
>> List Info and
>> Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
>> One-click
>> Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/dooley%40tacc.utexas.edu?unsub=1&unsubconfirm=1
>>
>>
>>
>>
>>
>> _______________________________________________
>> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
>> List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
>> One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/ghiban%40cshl.edu?unsub=1&unsubconfirm=1
>
>
> _______________________________________________
> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
> One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/dooley%40tacc.utexas.edu?unsub=1&unsubconfirm=1
More information about the Iplant-api-dev
mailing list