[Iplant-api-dev] agave, etc.
Barthelson, Roger A - (rogerab)
rogerab at email.arizona.edu
Tue Aug 12 13:19:04 MST 2014
Hi Rion-
Thanks for the input. What you say is helpful, but I don’t know if that is a current view of my system (I keep changing them), but I do have a scratch directory defined. Or I tried to anyway. I previously had a work and home directory defined, and I was getting the same error, so I thought I might be able to avoid it doing what you just said it was doing ( trying to use the home directory) by not listing one. But I still had a non-empty scratch directory defined:
{
"id": "rogerab-lonestarland",
"name": "Roger Barthelson Lonestar Account",
"status": "UP",
"type": "EXECUTION",
"description": "Where I run my HPC codes.",
"site": "tacc.xsede.org",
"executionType": "HPC",
"default": true,
"queues": [
{
"name": "normal",
"maxJobs": 400,
"maxNodes": 128,
"maxProcessorsPerNode": 12,
"maxRequestedTime": "24:00:00",
"maxMemoryPerNode": "24GB",
"customDirectives": " -A iPlant-Master",
"default": true
},
{
"name": "largemem",
"maxJobs": 300,
"maxNodes": 1,
"maxProcessorsPerNode": 24,
"maxRequestedTime": "24:00:00",
"maxMemoryPerNode": "999GB",
"customDirectives": " -A iPlant-Master",
"default": true
}
],
"login": {
"host": "lonestar.tacc.utexas.edu",
"port": 22,
"protocol": "SSH",
"scratchDir": "/scratch/01685/rogerab",
"auth": {
"username": "rogerab",
"password": “PASSWORD",
"type": "PASSWORD",
"default": true
}
},
"storage": {
"host": "lonestar.tacc.utexas.edu",
"port": 22,
"protocol": "SFTP",
"rootDir": "/",
"scratchDir": "/scratch/01685/rogerab",
"auth": {
"username": "rogerab",
"password": “PASSWORD",
"type": "PASSWORD"
}
},
"scheduler": "SGE",
"environment": "",
"startupScript": "./bashrc"
}
I just hoped it would use the only directory defined, but I guess that didn’t work. But if this is what is registered in Agave for my rogerab-lonestarland, it is not correct!
Roger
--
Roger Barthelson Ph.D.
Scientific Analyst
iPlant Collaborative
BIO5 Institute, University of Arizona
Phone: 520-977-5249
Email: rogerab at email.arizona.edu<mailto:rogerab at email.arizona.edu>
Web: www.iplantcollaborative.org/<http://www.iplantcollaborative.org/>
On August 12, 2014 at 1:04:34 PM, Rion Dooley (dooley at tacc.utexas.edu<mailto:dooley at tacc.utexas.edu>) wrote:
Hi Roger,
The issue is that the paths you’re trying to use don’t work. Here is your system description:
$ systems-list -v rogerab-lonestarland
{
"_links": {
"credentials": {
"href": "https://agave.iplantc.org/systems/v2/rogerab-lonestarland/credentials"
},
"metadata": {
"href": "https://agave.iplantc.org/meta/v2/data/?q={\"associationIds\":\"0001390692364782-5056a550b8-0001-006\"}"
},
"roles": {
"href": "https://agave.iplantc.org/systems/v2/rogerab-lonestarland/roles"
},
"self": {
"href": "https://agave.iplantc.org/systems/v2/rogerab-lonestarland"
}
},
"description": "Where I run my HPC codes.",
"environment": null,
"executionType": "HPC",
"id": "rogerab-lonestarland",
"lastModified": "2014-08-12T12:47:25.000-05:00",
"login": {
"auth": {
"type": "PASSWORD"
},
"host": "lonestar.tacc.utexas.edu<http://lonestar.tacc.utexas.edu>",
"port": 22,
"protocol": "SSH",
"proxy": null
},
"maxSystemJobs": 2147483647,
"maxSystemJobsPerUser": 2147483647,
"name": "Roger Barthelson Lonestar Account",
"public": false,
"queues": [
{
"customDirectives": " -A iPlant-Master",
"default": false,
"maxJobs": 400,
"maxMemoryPerNode": 24,
"maxNodes": 128,
"maxProcessorsPerNode": 12,
"maxUserJobs": -1,
"name": "normal"
},
{
"customDirectives": " -A iPlant-Master",
"default": true,
"maxJobs": 300,
"maxMemoryPerNode": 999,
"maxNodes": 1,
"maxProcessorsPerNode": 24,
"maxUserJobs": -1,
"name": "largemem"
}
],
"revision": 16,
"scheduler": "SGE",
"scratchDir": "",
"site": "tacc.xsede.org<http://tacc.xsede.org>",
"startupScript": "./bashrc",
"status": "UP",
"storage": {
"auth": {
"type": "PASSWORD"
},
"homeDir": null,
"host": "lonestar.tacc.utexas.edu<http://lonestar.tacc.utexas.edu>",
"mirror": false,
"port": 22,
"protocol": "SFTP",
"proxy": null,
"rootDir": "/"
},
"type": "EXECUTION",
"uuid": "0001390692364782-5056a550b8-0001-006",
"workDir": ""
}
The relevant parts to debug the problem are:
storage.homeDir = null
storage.rootDir = “/"
scratchDir = “"
rootDir = “”
The issue is that when you submit a job, Agave will create a folder into which your inputs will be staged and a all job assets will be copied. This will become `pwd` when your job runs. On your rogerab-lonestarland system, you have storage.rootDir set to “/" and storage.homeDIr set to null, which means that your home directory and root directly, in the eyes of Agave, are the same path, “/“. There’s nothing wrong with that, per se, but your scratchDir and workDir are also both set to “”, which means Agave will try to create a temporary job directory in your home directory, “/“. Because you don’t have permission to create directories in “/“ on TACC's Lonestar cluster (which is where your system is pointing), the file staging fails. This is why the history log says that it, “...Failed to create the remote job directory rogerab/job-0001407866329350-5056a550b8-0001-007-newbler-_newbler-26__test2 on rogerab-lonestarland” in the log below.
$ jobs-history -d 0001407866329350-5056a550b8-0001-007
Job accepted and queued for submission.
Attempt 1 to stage job inputs
Identifying input files for staging
Attempt 1 failed to stage job inputs. Failed to create the remote job directory rogerab/job-0001407866329350-5056a550b8-0001-007-newbler-_newbler-26__test2 on rogerab-lonestarland
Attempt 2 to stage job inputs
Identifying input files for staging
Attempt 2 failed to stage job inputs. Failed to create the remote job directory rogerab/job-0001407866329350-5056a550b8-0001-007-newbler-_newbler-26__test2 on rogerab-lonestarland
Attempt 3 to stage job inputs
Identifying input files for staging
Attempt 3 failed to stage job inputs. Failed to create the remote job directory rogerab/job-0001407866329350-5056a550b8-0001-007-newbler-_newbler-26__test2 on rogerab-lonestarland
Cleaning up remote work directory.
Completed cleaning up remote work directory.
Unable to stage inputs for job after 3 attempts. Job cancelled.
To fix this problem, either set your systems’s storage.homeDir to your actual home directory, or set scratchDir and/or workDir to folders where you have write access. In your case, they should be, /scratch/01685/rogerab and /work/01685/rogerab, respectively.
I hope this helps. Let me know if you need help updating the different paths. I hate to see you stuck for so long on stuff like this.
--
Rion
On Aug 12, 2014, at 2:43 PM, Barthelson, Roger A - (rogerab) <rogerab at email.arizona.edu<mailto:rogerab at email.arizona.edu>> wrote:
I keep running into the same problem when I try to run a job with Agave. I am told that the inputs could not be staged — specifically that a job directory could not be created on my system, e.g.:
Attempt 3 failed to stage job inputs. Failed to create the remote job directory rogerab/job-0001407866329350-5056a550b8-0001-007-newbler-_newbler-26__test2 on rogerab-lonestarland
I’m not sure why this should be the case. I defined a scratch drive for my system, the login is correct.
In any case, the result is that the job fails. It fails whether I try to run it via json file and CLI, or in the DE.
That is my current blocking point.
Roger
--
Roger Barthelson Ph.D.
Scientific Analyst
iPlant Collaborative
BIO5 Institute, University of Arizona
Phone: 520-977-5249
Email: rogerab at email.arizona.edu<mailto:rogerab at email.arizona.edu>
Web: www.iplantcollaborative.org/<http://www.iplantcollaborative.org/>
_______________________________________________
Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org<mailto:Iplant-api-dev at iplantcollaborative.org>
List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/dooley%40tacc.utexas.edu?unsub=1&unsubconfirm=1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.iplantcollaborative.org/pipermail/iplant-api-dev/attachments/20140812/16d64f9d/attachment-0001.html
More information about the Iplant-api-dev
mailing list