[Iplant-api-dev] Jobs stuck at "CLEANING_UP" - Re: Jobs Stuck At Pending
Duvick, Jonathan P [GDCBS]
jduvick at iastate.edu
Wed Jul 15 16:18:28 MST 2015
I have a job stuck at CLEANING UP and what is curious is that when I grab the sequence of status and associated error messages in the json reply, they are out of sync.
Normally associations are something like this:
STAGING JOB - 'Attempt 1 to submit job'
QUEUED -'HPC job successfully placed into normal queue as local job 5512660'
RUNNING - 'Job started running'
CLEANING UP - 'No archive file found. Entire job directory will be archived.'
But here the associations are:
STAGING JOB - 'Attempt 1 to submit job'
QUEUED - ' Attempt 1 to submit job' (out of sync)
RUNNING - 'HPC job successfully placed into normal queue as local job 5512660'
CLEANING UP - 'Job started running '
Jon Duvick
PlantGDB Manager
http://www.plantgdb.org/
Department of Genetics, Development and Cell Biology
2258 Molecular Biology Building
Iowa State University
Ames IA 50011
(515) 294-2360
(515) 294-6755 FAX
________________________________________
From: iplant-api-dev-bounces at iplantcollaborative.org <iplant-api-dev-bounces at iplantcollaborative.org> on behalf of Ghiban, Cornel <ghiban at cshl.edu>
Sent: Tuesday, July 14, 2015 11:52 AM
To: Duvick, Jonathan P [GDCBS]
Cc: Discussion of iPlant API development
Subject: Re: [Iplant-api-dev] Jobs stuck at "CLEANING_UP" - Re: Jobs Stuck At Pending
Can confirm this is happening for iPlant too. I have about 15 jobs in
CLEANING_UP, some from yesterday.
Cheers,
Cornel
On Tue, 2015-07-14 at 09:48 -0700, Brian Corrie wrote:
> Hi Rion, Others,
>
> Is this problem still a problem? We are seeing jobs running but getting
> stuck in the "CLEANING_UP" state. It looks like the data is not being
> staged out of the execution system (but no error messages that I can
> see). Example from yesterday is the following job:
>
> 5800971945277582875-e0bd34dffff8de6-0001-007
>
> This is on the irec tenant if that makes any difference. The job is
> completing successfully but the data is not being staged out. The output
> of the job is in the staging folder on the execution system:
>
> bcorrie at bugaboo:job-5800971945277582875-e0bd34dffff8de6-0001-007-irec-job-86>
> ls
> app.sh
> irec-job-86-5800971945277582875-e0bd34dffff8de6-0001-007.err preprocess.py
> histogram.jpg
> irec-job-86-5800971945277582875-e0bd34dffff8de6-0001-007.out test.sh
> histogram.m irec-job-86.ipcexe
>
> There are error and output files, as well as the resulting image from
> the job. So the job was queued and ran to completion as reported by
> AGAVE. The job history is below.
>
> Any thoughts?
>
> Brian
>
>
> bcorrie at bugaboo:job-5800971945277582875-e0bd34dffff8de6-0001-007-irec-job-86>
> jobs-history -V 5800971945277582875-e0bd34dffff8de6-0001-007
>
> {
> "status" : "success",
> "message" : null,
> "version" : "2.1.3-r8accb",
> "result" : [ {
> "created" : "2015-07-13T18:48:42.000-05:00",
> "status" : "PENDING",
> "description" : "Job accepted and queued for submission."
> }, {
> "created" : "2015-07-13T18:56:41.000-05:00",
> "status" : "PROCESSING_INPUTS",
> "description" : "Identifying input files for staging"
> }, {
> "created" : "2015-07-13T18:56:41.000-05:00",
> "status" : "PROCESSING_INPUTS",
> "description" : "Attempt 1 to stage job inputs"
> }, {
> "progress" : {
> "averageRate" : 634016,
> "totalFiles" : 1,
> "source" :
> "agave://system-staging-irec-bcorrie/2015-07-13_16-48-33_55a44e516107c/data.csv.zip",
> "totalActiveTransfers" : 0,
> "totalBytes" : 2536066,
> "totalBytesTransferred" : 2536066
> },
> "created" : "2015-07-13T18:56:44.000-05:00",
> "status" : "STAGING_INPUTS",
> "description" : "Copy in progress"
> }, {
> "created" : "2015-07-13T18:56:49.000-05:00",
> "status" : "STAGED",
> "description" : "Job inputs staged to execution system"
> }, {
> "created" : "2015-07-13T18:56:50.000-05:00",
> "status" : "SUBMITTING",
> "description" : "Attempt 1 to submit job"
> }, {
> "created" : "2015-07-13T18:56:50.000-05:00",
> "status" : "SUBMITTING",
> "description" : "Preparing job for submission."
> }, {
> "created" : "2015-07-13T18:56:52.000-05:00",
> "status" : "STAGING_JOB",
> "description" : "Fetching app assets from
> agave://system-deploy-irec-bcorrie/histogram"
> }, {
> "created" : "2015-07-13T18:57:05.000-05:00",
> "status" : "STAGING_JOB",
> "description" : "Staging runtime assets to
> agave://system-exec--bugaboo-westgrid-ca-bcorrie/bcorrie/job-5800971945277582875-e0bd34dffff8de6-0001-007-irec-job-86"
> }, {
> "created" : "2015-07-13T18:57:21.000-05:00",
> "status" : "QUEUED",
> "description" : "HPC job successfully placed into pre queue as
> local job 22837661"
> }, {
> "created" : "2015-07-13T19:01:36.000-05:00",
> "status" : "RUNNING",
> "description" : "Job started running"
> }, {
> "created" : "2015-07-13T19:01:49.000-05:00",
> "status" : "CLEANING_UP"
> } ]
> }
>
>
> On 11/07/2015 8:50 AM, Rion Dooley wrote:
> > <expelative>
> > I’m aware and working on it. Your jobs are getting submitted twice and the latter is overwriting the former.
> > </expelative>
> >
> > —
> > Rion
> >
> >> On Jul 11, 2015, at 10:47 AM, Duvick, Jonathan P [GDCBS] <jduvick at iastate.edu> wrote:
> >>
> >> Thanks for your work; my test jobs are completing but output is empty, and I'm seeing a lot of the 'Text file busy' messages like before...
> >>
> >> Example:
> >>
> >> job_id 5306259608984949221-e0bd34dffff8de6-0001-007
> >>
> >> Also, 500 errors are pretty common when sending API requests to jobs and output/listings.
> >>
> >> (Separate issue: the Arizona ntp server appears to be running 7 hours fast.)
> >>
> >> Thanks,
> >>
> >> Jon Duvick
> >> PlantGDB Manager
> >> http://www.plantgdb.org/
> >> Department of Genetics, Development and Cell Biology
> >> 2258 Molecular Biology Building
> >> Iowa State University
> >> Ames IA 50011
> >>
> >> (515) 294-2360
> >> (515) 294-6755 FAX
> >> ________________________________________
> >> From: iplant-api-dev-bounces at iplantcollaborative.org <iplant-api-dev-bounces at iplantcollaborative.org> on behalf of John Fonner <jfonner at tacc.utexas.edu>
> >> Sent: Friday, July 10, 2015 3:13 AM
> >> To: Duvick, Jonathan P [GDCBS]
> >> Cc: Discussion of iPlant API development
> >> Subject: Re: [Iplant-api-dev] Jobs Stuck At Pending
> >>
> >> Hi everyone,
> >>
> >> Just wanted to send out a quick status update. Agave is back online and
> >> everything appears to be healthy. Jobs are flowing, and hopefully Rion
> >> can end his vigil and catch some sleep. More info on bug fixes and
> >> features will come next week. For now, though, please test things out and
> >> let us know if the service is working for you.
> >>
> >> Thanks to the Agave team for all the hard work!
> >>
> >> Thanks,
> >> Fonner
> >>
> >> On 7/9/15, 12:21 PM, "iplant-api-dev-bounces at iplantcollaborative.org on
> >> behalf of Brian Corrie" <iplant-api-dev-bounces at iplantcollaborative.org on
> >> behalf of bcorrie at sfu.ca> wrote:
> >>
> >>> Thanks for the update Rion... We will leave you alone, good luck... 8-)
> >>>
> >>> Brian
> >>>
> >>> On 09/07/2015 10:18 AM, Rion Dooley wrote:
> >>>> Submission is paused due to a critical issue in the api stemming from
> >>>> several factors. We have been working around the clock to address the
> >>>> problem and restore full service to iPlant as well as several other
> >>>> tenants. We have identified the problem, written a patch, and are
> >>>> currently load testing it on our staging servers. We hope to have
> >>>> service back up after lunch.
> >>>>
> >>>> ‹
> >>>> Rion
> >>>>
> >>>>> On Jul 9, 2015, at 12:13 PM, Barthelson, Roger A - (rogerab)
> >>>>> <rogerab at email.arizona.edu <mailto:rogerab at email.arizona.edu>> wrote:
> >>>>>
> >>>>> All jobs I have run from the DE in the past 4 days have stalled at
> >>>>> submitted. Most of the time the data gets staged, but then nothing
> >>>>> happens. No outputs, logs, etc returned. Jobs are still listed in the
> >>>>> DE as submitted, but I think they have essentially failed because of
> >>>>> system errors.
> >>>>>
> >>>>> Roger
> >>>>> --
> >>>>> Roger Barthelson Ph.D.
> >>>>> Scientific Analyst
> >>>>> iPlant Collaborative
> >>>>> BIO5 Institute, University of Arizona
> >>>>> Phone: 520-977-5249
> >>>>> Email: rogerab at email.arizona.edu <mailto:rogerab at email.arizona.edu>
> >>>>> Web: www.iplantcollaborative.org/ <http://www.iplantcollaborative.org/>
> >>>>>
> >>>>> On July 9, 2015 at 8:24:55 AM, Fritz-Waters, Eric R [AN S]
> >>>>> (ercfrtz at iastate.edu <mailto:ercfrtz at iastate.edu>) wrote:
> >>>>>
> >>>>>> I submitted some jobs via the api on Tuesday. They are still stuck at
> >>>>>> the Pending status. I also have some previous jobs that are stuck at
> >>>>>> the states they are in, both Staged and Pending.
> >>>>>>
> >>>>>> -Eric Fritz-Waters
> >>>>>> _______________________________________________
> >>>>>> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> >>>>>> <mailto:Iplant-api-dev at iplantcollaborative.org>
> >>>>>> List Info and Archives:
> >>>>>> http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
> >>>>>> One-click Unsubscribe:
> >>>>>>
> >>>>>> http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/roge
> >>>>>> rab%40email.arizona.edu?unsub=1&unsubconfirm=1
> >>>>>>
> >>>>> _______________________________________________
> >>>>> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> >>>>> <mailto:Iplant-api-dev at iplantcollaborative.org>
> >>>>> List Info and Archives:
> >>>>> http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
> >>>>> One-click Unsubscribe:
> >>>>>
> >>>>> http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/doole
> >>>>> y%40tacc.utexas.edu?unsub=1&unsubconfirm=1
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> >>>> List Info and Archives:
> >>>> http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
> >>>> One-click Unsubscribe:
> >>>> http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/bcorri
> >>>> e%40sfu.ca?unsub=1&unsubconfirm=1
> >>>>
> >>> _______________________________________________
> >>> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> >>> List Info and Archives:
> >>> http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
> >>> One-click Unsubscribe:
> >>> http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/jfonner
> >>> %40tacc.utexas.edu?unsub=1&unsubconfirm=1
> >>
> >>
> >> _______________________________________________
> >> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> >> List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
> >> One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/jduvick%40iastate.edu?unsub=1&unsubconfirm=1
> >>
> >> _______________________________________________
> >> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> >> List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
> >> One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/dooley%40tacc.utexas.edu?unsub=1&unsubconfirm=1
> >
> >
> > _______________________________________________
> > Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> > List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
> > One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/bcorrie%40sfu.ca?unsub=1&unsubconfirm=1
> >
> _______________________________________________
> Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
> List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
> One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/ghiban%40cshl.edu?unsub=1&unsubconfirm=1
_______________________________________________
Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org
List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/jduvick%40iastate.edu?unsub=1&unsubconfirm=1
More information about the Iplant-api-dev
mailing list