[Iplant-api-dev] Jobs stuck at "CLEANING_UP" - Re: Jobs Stuck At Pending

Brian Corrie bcorrie at sfu.ca
Tue Jul 14 09:48:36 MST 2015


Hi Rion, Others,

Is this problem still a problem? We are seeing jobs running but getting 
stuck in the "CLEANING_UP" state. It looks like the data is not being 
staged out of the execution system (but no error messages that I can 
see). Example from yesterday is the following job:

5800971945277582875-e0bd34dffff8de6-0001-007

This is on the irec tenant if that makes any difference. The job is 
completing successfully but the data is not being staged out. The output 
of the job is in the staging folder on the execution system:

bcorrie at bugaboo:job-5800971945277582875-e0bd34dffff8de6-0001-007-irec-job-86> 
ls
app.sh 
irec-job-86-5800971945277582875-e0bd34dffff8de6-0001-007.err  preprocess.py
histogram.jpg 
irec-job-86-5800971945277582875-e0bd34dffff8de6-0001-007.out  test.sh
histogram.m    irec-job-86.ipcexe

There are error and output files, as well as the resulting image from 
the job. So the job was queued and ran to completion as reported by 
AGAVE. The job history is below.

Any thoughts?

Brian


bcorrie at bugaboo:job-5800971945277582875-e0bd34dffff8de6-0001-007-irec-job-86> 
jobs-history -V 5800971945277582875-e0bd34dffff8de6-0001-007

{
   "status" : "success",
   "message" : null,
   "version" : "2.1.3-r8accb",
   "result" : [ {
     "created" : "2015-07-13T18:48:42.000-05:00",
     "status" : "PENDING",
     "description" : "Job accepted and queued for submission."
   }, {
     "created" : "2015-07-13T18:56:41.000-05:00",
     "status" : "PROCESSING_INPUTS",
     "description" : "Identifying input files for staging"
   }, {
     "created" : "2015-07-13T18:56:41.000-05:00",
     "status" : "PROCESSING_INPUTS",
     "description" : "Attempt 1 to stage job inputs"
   }, {
     "progress" : {
       "averageRate" : 634016,
       "totalFiles" : 1,
       "source" : 
"agave://system-staging-irec-bcorrie/2015-07-13_16-48-33_55a44e516107c/data.csv.zip",
       "totalActiveTransfers" : 0,
       "totalBytes" : 2536066,
       "totalBytesTransferred" : 2536066
     },
     "created" : "2015-07-13T18:56:44.000-05:00",
     "status" : "STAGING_INPUTS",
     "description" : "Copy in progress"
   }, {
     "created" : "2015-07-13T18:56:49.000-05:00",
     "status" : "STAGED",
     "description" : "Job inputs staged to execution system"
   }, {
     "created" : "2015-07-13T18:56:50.000-05:00",
     "status" : "SUBMITTING",
     "description" : "Attempt 1 to submit job"
   }, {
     "created" : "2015-07-13T18:56:50.000-05:00",
     "status" : "SUBMITTING",
     "description" : "Preparing job for submission."
   }, {
     "created" : "2015-07-13T18:56:52.000-05:00",
     "status" : "STAGING_JOB",
     "description" : "Fetching app assets from 
agave://system-deploy-irec-bcorrie/histogram"
   }, {
     "created" : "2015-07-13T18:57:05.000-05:00",
     "status" : "STAGING_JOB",
     "description" : "Staging runtime assets to 
agave://system-exec--bugaboo-westgrid-ca-bcorrie/bcorrie/job-5800971945277582875-e0bd34dffff8de6-0001-007-irec-job-86"
   }, {
     "created" : "2015-07-13T18:57:21.000-05:00",
     "status" : "QUEUED",
     "description" : "HPC job successfully placed into pre queue as 
local job 22837661"
   }, {
     "created" : "2015-07-13T19:01:36.000-05:00",
     "status" : "RUNNING",
     "description" : "Job started running"
   }, {
     "created" : "2015-07-13T19:01:49.000-05:00",
     "status" : "CLEANING_UP"
   } ]
}


On 11/07/2015 8:50 AM, Rion Dooley wrote:
> <expelative>
> I’m aware and working on it. Your jobs are getting submitted twice and the latter is overwriting the former.
> </expelative>
>
>> Rion
>
>> On Jul 11, 2015, at 10:47 AM, Duvick, Jonathan P [GDCBS] <jduvick at iastate.edu> wrote:
>>
>> Thanks for your work; my test jobs are completing but output is empty, and I'm seeing a lot of the 'Text file busy' messages like before...
>>
>> Example:
>>
>> job_id 5306259608984949221-e0bd34dffff8de6-0001-007
>>
>> Also, 500 errors are pretty common when sending API requests to jobs and output/listings.
>>
>> (Separate issue: the Arizona ntp server appears to be running 7 hours fast.)
>>
>> Thanks,
>>
>> Jon Duvick
>> PlantGDB Manager
>> http://www.plantgdb.org/
>> Department of Genetics, Development and Cell Biology
>> 2258 Molecular Biology Building
>> Iowa State University
>> Ames IA 50011
>>
>> (515) 294-2360
>> (515) 294-6755 FAX
>> ________________________________________
>> From: iplant-api-dev-bounces at iplantcollaborative.org <iplant-api-dev-bounces at iplantcollaborative.org> on behalf of John Fonner <jfonner at tacc.utexas.edu>
>> Sent: Friday, July 10, 2015 3:13 AM
>> To: Duvick, Jonathan P [GDCBS]
>> Cc: Discussion of iPlant API development
>> Subject: Re: [Iplant-api-dev] Jobs Stuck At Pending
>>
>> Hi everyone,
>>
>> Just wanted to send out a quick status update.  Agave is back online and
>> everything appears to be healthy.  Jobs are flowing, and hopefully Rion
>> can end his vigil and catch some sleep.  More info on bug fixes and
>> features will come next week.  For now, though, please test things out and
>> let us know if the service is working for you.
>>
>> Thanks to the Agave team for all the hard work!
>>
>> Thanks,
>> Fonner
>>
>> On 7/9/15, 12:21 PM, "iplant-api-dev-bounces at iplantcollaborative.org on
>> behalf of Brian Corrie" <iplant-api-dev-bounces at iplantcollaborative.org on
>> behalf of bcorrie at sfu.ca> wrote:
>>
>>> Thanks for the update Rion... We will leave you alone, good luck... 8-)
>>>
>>> Brian
>>>
>>> On 09/07/2015 10:18 AM, Rion Dooley wrote:
>>>> Submission is paused due to a critical issue in the api stemming from
>>>> several factors. We have been working around the clock to address the
>>>> problem and restore full service to iPlant as well as several other
>>>> tenants. We have identified the problem, written a patch, and are
>>>> currently load testing it on our staging servers. We hope to have
>>>> service back up after lunch.
>>>>
>>>>>>>> Rion
>>>>
>>>>> On Jul 9, 2015, at 12:13 PM, Barthelson, Roger A - (rogerab)
>>>>> <rogerab at email.arizona.edu <mailto:rogerab at email.arizona.edu>> wrote:
>>>>>
>>>>> All jobs I have run from the DE in the past 4 days have stalled at
>>>>> submitted. Most of the time the data gets staged, but then nothing
>>>>> happens. No outputs, logs, etc returned. Jobs are still listed in the
>>>>> DE as submitted, but I think they have essentially failed because of
>>>>> system errors.
>>>>>
>>>>> Roger
>>>>> --
>>>>> Roger Barthelson Ph.D.
>>>>> Scientific Analyst
>>>>> iPlant Collaborative
>>>>> BIO5 Institute, University of Arizona
>>>>> Phone: 520-977-5249
>>>>> Email: rogerab at email.arizona.edu <mailto:rogerab at email.arizona.edu>
>>>>> Web: www.iplantcollaborative.org/ <http://www.iplantcollaborative.org/>
>>>>>
>>>>> On July 9, 2015 at 8:24:55 AM, Fritz-Waters, Eric R [AN S]
>>>>> (ercfrtz at iastate.edu <mailto:ercfrtz at iastate.edu>) wrote:
>>>>>
>>>>>> I submitted some jobs via the api on Tuesday. They are still stuck at
>>>>>> the Pending status. I also have some previous jobs that are stuck at
>>>>>> the states they are in, both Staged and Pending.
>>>>>>
>>>>>> -Eric Fritz-Waters
>>>>>> _______________________________________________
>>>>>> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
>>>>>> <mailto:Iplant-api-dev at iplantcollaborative.org>
>>>>>> List Info and Archives:
>>>>>> http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
>>>>>> One-click Unsubscribe:
>>>>>>
>>>>>> http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/roge
>>>>>> rab%40email.arizona.edu?unsub=1&unsubconfirm=1
>>>>>>
>>>>> _______________________________________________
>>>>> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
>>>>> <mailto:Iplant-api-dev at iplantcollaborative.org>
>>>>> List Info and Archives:
>>>>> http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
>>>>> One-click Unsubscribe:
>>>>>
>>>>> http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/doole
>>>>> y%40tacc.utexas.edu?unsub=1&unsubconfirm=1
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
>>>> List Info and Archives:
>>>> http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
>>>> One-click Unsubscribe:
>>>> http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/bcorri
>>>> e%40sfu.ca?unsub=1&unsubconfirm=1
>>>>
>>> _______________________________________________
>>> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
>>> List Info and Archives:
>>> http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
>>> One-click Unsubscribe:
>>> http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/jfonner
>>> %40tacc.utexas.edu?unsub=1&unsubconfirm=1
>>
>>
>> _______________________________________________
>> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
>> List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
>> One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/jduvick%40iastate.edu?unsub=1&unsubconfirm=1
>>
>> _______________________________________________
>> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
>> List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
>> One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/dooley%40tacc.utexas.edu?unsub=1&unsubconfirm=1
>
>
> _______________________________________________
> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
> One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/bcorrie%40sfu.ca?unsub=1&unsubconfirm=1
>


More information about the Iplant-api-dev mailing list