[Iplant-api-dev] Jobs stuck at "CLEANING_UP" - Re: Jobs Stuck At Pending

Ghiban, Cornel ghiban at cshl.edu
Tue Jul 14 09:52:25 MST 2015


Can confirm this is happening for iPlant too. I have about 15 jobs in
CLEANING_UP, some from yesterday.

Cheers,
Cornel


On Tue, 2015-07-14 at 09:48 -0700, Brian Corrie wrote:
> Hi Rion, Others,
> 
> Is this problem still a problem? We are seeing jobs running but getting 
> stuck in the "CLEANING_UP" state. It looks like the data is not being 
> staged out of the execution system (but no error messages that I can 
> see). Example from yesterday is the following job:
> 
> 5800971945277582875-e0bd34dffff8de6-0001-007
> 
> This is on the irec tenant if that makes any difference. The job is 
> completing successfully but the data is not being staged out. The output 
> of the job is in the staging folder on the execution system:
> 
> bcorrie at bugaboo:job-5800971945277582875-e0bd34dffff8de6-0001-007-irec-job-86> 
> ls
> app.sh 
> irec-job-86-5800971945277582875-e0bd34dffff8de6-0001-007.err  preprocess.py
> histogram.jpg 
> irec-job-86-5800971945277582875-e0bd34dffff8de6-0001-007.out  test.sh
> histogram.m    irec-job-86.ipcexe
> 
> There are error and output files, as well as the resulting image from 
> the job. So the job was queued and ran to completion as reported by 
> AGAVE. The job history is below.
> 
> Any thoughts?
> 
> Brian
> 
> 
> bcorrie at bugaboo:job-5800971945277582875-e0bd34dffff8de6-0001-007-irec-job-86> 
> jobs-history -V 5800971945277582875-e0bd34dffff8de6-0001-007
> 
> {
>    "status" : "success",
>    "message" : null,
>    "version" : "2.1.3-r8accb",
>    "result" : [ {
>      "created" : "2015-07-13T18:48:42.000-05:00",
>      "status" : "PENDING",
>      "description" : "Job accepted and queued for submission."
>    }, {
>      "created" : "2015-07-13T18:56:41.000-05:00",
>      "status" : "PROCESSING_INPUTS",
>      "description" : "Identifying input files for staging"
>    }, {
>      "created" : "2015-07-13T18:56:41.000-05:00",
>      "status" : "PROCESSING_INPUTS",
>      "description" : "Attempt 1 to stage job inputs"
>    }, {
>      "progress" : {
>        "averageRate" : 634016,
>        "totalFiles" : 1,
>        "source" : 
> "agave://system-staging-irec-bcorrie/2015-07-13_16-48-33_55a44e516107c/data.csv.zip",
>        "totalActiveTransfers" : 0,
>        "totalBytes" : 2536066,
>        "totalBytesTransferred" : 2536066
>      },
>      "created" : "2015-07-13T18:56:44.000-05:00",
>      "status" : "STAGING_INPUTS",
>      "description" : "Copy in progress"
>    }, {
>      "created" : "2015-07-13T18:56:49.000-05:00",
>      "status" : "STAGED",
>      "description" : "Job inputs staged to execution system"
>    }, {
>      "created" : "2015-07-13T18:56:50.000-05:00",
>      "status" : "SUBMITTING",
>      "description" : "Attempt 1 to submit job"
>    }, {
>      "created" : "2015-07-13T18:56:50.000-05:00",
>      "status" : "SUBMITTING",
>      "description" : "Preparing job for submission."
>    }, {
>      "created" : "2015-07-13T18:56:52.000-05:00",
>      "status" : "STAGING_JOB",
>      "description" : "Fetching app assets from 
> agave://system-deploy-irec-bcorrie/histogram"
>    }, {
>      "created" : "2015-07-13T18:57:05.000-05:00",
>      "status" : "STAGING_JOB",
>      "description" : "Staging runtime assets to 
> agave://system-exec--bugaboo-westgrid-ca-bcorrie/bcorrie/job-5800971945277582875-e0bd34dffff8de6-0001-007-irec-job-86"
>    }, {
>      "created" : "2015-07-13T18:57:21.000-05:00",
>      "status" : "QUEUED",
>      "description" : "HPC job successfully placed into pre queue as 
> local job 22837661"
>    }, {
>      "created" : "2015-07-13T19:01:36.000-05:00",
>      "status" : "RUNNING",
>      "description" : "Job started running"
>    }, {
>      "created" : "2015-07-13T19:01:49.000-05:00",
>      "status" : "CLEANING_UP"
>    } ]
> }
> 
> 
> On 11/07/2015 8:50 AM, Rion Dooley wrote:
> > <expelative>
> > I’m aware and working on it. Your jobs are getting submitted twice and the latter is overwriting the former.
> > </expelative>
> >
> > —
> > Rion
> >
> >> On Jul 11, 2015, at 10:47 AM, Duvick, Jonathan P [GDCBS] <jduvick at iastate.edu> wrote:
> >>
> >> Thanks for your work; my test jobs are completing but output is empty, and I'm seeing a lot of the 'Text file busy' messages like before...
> >>
> >> Example:
> >>
> >> job_id 5306259608984949221-e0bd34dffff8de6-0001-007
> >>
> >> Also, 500 errors are pretty common when sending API requests to jobs and output/listings.
> >>
> >> (Separate issue: the Arizona ntp server appears to be running 7 hours fast.)
> >>
> >> Thanks,
> >>
> >> Jon Duvick
> >> PlantGDB Manager
> >> http://www.plantgdb.org/
> >> Department of Genetics, Development and Cell Biology
> >> 2258 Molecular Biology Building
> >> Iowa State University
> >> Ames IA 50011
> >>
> >> (515) 294-2360
> >> (515) 294-6755 FAX
> >> ________________________________________
> >> From: iplant-api-dev-bounces at iplantcollaborative.org <iplant-api-dev-bounces at iplantcollaborative.org> on behalf of John Fonner <jfonner at tacc.utexas.edu>
> >> Sent: Friday, July 10, 2015 3:13 AM
> >> To: Duvick, Jonathan P [GDCBS]
> >> Cc: Discussion of iPlant API development
> >> Subject: Re: [Iplant-api-dev] Jobs Stuck At Pending
> >>
> >> Hi everyone,
> >>
> >> Just wanted to send out a quick status update.  Agave is back online and
> >> everything appears to be healthy.  Jobs are flowing, and hopefully Rion
> >> can end his vigil and catch some sleep.  More info on bug fixes and
> >> features will come next week.  For now, though, please test things out and
> >> let us know if the service is working for you.
> >>
> >> Thanks to the Agave team for all the hard work!
> >>
> >> Thanks,
> >> Fonner
> >>
> >> On 7/9/15, 12:21 PM, "iplant-api-dev-bounces at iplantcollaborative.org on
> >> behalf of Brian Corrie" <iplant-api-dev-bounces at iplantcollaborative.org on
> >> behalf of bcorrie at sfu.ca> wrote:
> >>
> >>> Thanks for the update Rion... We will leave you alone, good luck... 8-)
> >>>
> >>> Brian
> >>>
> >>> On 09/07/2015 10:18 AM, Rion Dooley wrote:
> >>>> Submission is paused due to a critical issue in the api stemming from
> >>>> several factors. We have been working around the clock to address the
> >>>> problem and restore full service to iPlant as well as several other
> >>>> tenants. We have identified the problem, written a patch, and are
> >>>> currently load testing it on our staging servers. We hope to have
> >>>> service back up after lunch.
> >>>>
> >>>> ‹
> >>>> Rion
> >>>>
> >>>>> On Jul 9, 2015, at 12:13 PM, Barthelson, Roger A - (rogerab)
> >>>>> <rogerab at email.arizona.edu <mailto:rogerab at email.arizona.edu>> wrote:
> >>>>>
> >>>>> All jobs I have run from the DE in the past 4 days have stalled at
> >>>>> submitted. Most of the time the data gets staged, but then nothing
> >>>>> happens. No outputs, logs, etc returned. Jobs are still listed in the
> >>>>> DE as submitted, but I think they have essentially failed because of
> >>>>> system errors.
> >>>>>
> >>>>> Roger
> >>>>> --
> >>>>> Roger Barthelson Ph.D.
> >>>>> Scientific Analyst
> >>>>> iPlant Collaborative
> >>>>> BIO5 Institute, University of Arizona
> >>>>> Phone: 520-977-5249
> >>>>> Email: rogerab at email.arizona.edu <mailto:rogerab at email.arizona.edu>
> >>>>> Web: www.iplantcollaborative.org/ <http://www.iplantcollaborative.org/>
> >>>>>
> >>>>> On July 9, 2015 at 8:24:55 AM, Fritz-Waters, Eric R [AN S]
> >>>>> (ercfrtz at iastate.edu <mailto:ercfrtz at iastate.edu>) wrote:
> >>>>>
> >>>>>> I submitted some jobs via the api on Tuesday. They are still stuck at
> >>>>>> the Pending status. I also have some previous jobs that are stuck at
> >>>>>> the states they are in, both Staged and Pending.
> >>>>>>
> >>>>>> -Eric Fritz-Waters
> >>>>>> _______________________________________________
> >>>>>> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> >>>>>> <mailto:Iplant-api-dev at iplantcollaborative.org>
> >>>>>> List Info and Archives:
> >>>>>> http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
> >>>>>> One-click Unsubscribe:
> >>>>>>
> >>>>>> http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/roge
> >>>>>> rab%40email.arizona.edu?unsub=1&unsubconfirm=1
> >>>>>>
> >>>>> _______________________________________________
> >>>>> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> >>>>> <mailto:Iplant-api-dev at iplantcollaborative.org>
> >>>>> List Info and Archives:
> >>>>> http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
> >>>>> One-click Unsubscribe:
> >>>>>
> >>>>> http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/doole
> >>>>> y%40tacc.utexas.edu?unsub=1&unsubconfirm=1
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> >>>> List Info and Archives:
> >>>> http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
> >>>> One-click Unsubscribe:
> >>>> http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/bcorri
> >>>> e%40sfu.ca?unsub=1&unsubconfirm=1
> >>>>
> >>> _______________________________________________
> >>> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> >>> List Info and Archives:
> >>> http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
> >>> One-click Unsubscribe:
> >>> http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/jfonner
> >>> %40tacc.utexas.edu?unsub=1&unsubconfirm=1
> >>
> >>
> >> _______________________________________________
> >> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> >> List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
> >> One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/jduvick%40iastate.edu?unsub=1&unsubconfirm=1
> >>
> >> _______________________________________________
> >> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> >> List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
> >> One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/dooley%40tacc.utexas.edu?unsub=1&unsubconfirm=1
> >
> >
> > _______________________________________________
> > Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> > List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
> > One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/bcorrie%40sfu.ca?unsub=1&unsubconfirm=1
> >
> _______________________________________________
> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev  
> One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/ghiban%40cshl.edu?unsub=1&unsubconfirm=1 




More information about the Iplant-api-dev mailing list