[Iplant-api-dev] agave, etc.

Barthelson, Roger A - (rogerab) rogerab at email.arizona.edu
Tue Aug 12 13:19:04 MST 2014


Hi Rion-
Thanks for the input. What you say is helpful, but I don’t know if that is a current view of my system (I keep changing them), but I do have a scratch directory defined. Or I tried to anyway. I previously had a work and home directory defined, and I was getting the same error, so I thought I might be able to avoid it doing what you just said it was doing ( trying to use the home directory) by not listing one. But I still had a non-empty scratch directory defined:

{
"id": "rogerab-lonestarland",
"name": "Roger Barthelson Lonestar Account",
"status": "UP",
"type": "EXECUTION",
"description": "Where I run my HPC codes.",
"site": "tacc.xsede.org",
"executionType": "HPC",
"default": true,
"queues": [
{
"name": "normal",
"maxJobs": 400,
"maxNodes": 128,
"maxProcessorsPerNode": 12,
"maxRequestedTime": "24:00:00",
"maxMemoryPerNode": "24GB",
"customDirectives": " -A iPlant-Master",
"default": true
},
{
"name": "largemem",
"maxJobs": 300,
"maxNodes": 1,
"maxProcessorsPerNode": 24,
"maxRequestedTime": "24:00:00",
"maxMemoryPerNode": "999GB",
"customDirectives": " -A iPlant-Master",
"default": true
}
],
"login": {
"host": "lonestar.tacc.utexas.edu",
"port": 22,
"protocol": "SSH",
"scratchDir": "/scratch/01685/rogerab",
"auth": {
"username": "rogerab",
"password": “PASSWORD",
"type": "PASSWORD",
"default": true
}
},
"storage": {
"host": "lonestar.tacc.utexas.edu",
"port": 22,
"protocol": "SFTP",
"rootDir": "/",
"scratchDir": "/scratch/01685/rogerab",
"auth": {
"username": "rogerab",
"password": “PASSWORD",
"type": "PASSWORD"
}
},
"scheduler": "SGE",
"environment": "",
"startupScript": "./bashrc"
}

I just hoped it would use the only directory defined, but I guess that didn’t work. But if this is what is registered in Agave for my rogerab-lonestarland, it is not correct!

Roger
--
Roger Barthelson Ph.D.
Scientific Analyst
iPlant Collaborative
BIO5 Institute, University of Arizona
Phone: 520-977-5249
Email: rogerab at email.arizona.edu<mailto:rogerab at email.arizona.edu>
Web: www.iplantcollaborative.org/<http://www.iplantcollaborative.org/>


On August 12, 2014 at 1:04:34 PM, Rion Dooley (dooley at tacc.utexas.edu<mailto:dooley at tacc.utexas.edu>) wrote:

Hi Roger,

The issue is that the paths you’re trying to use don’t work. Here is your system description:

$ systems-list -v rogerab-lonestarland
{
    "_links": {
        "credentials": {
            "href": "https://agave.iplantc.org/systems/v2/rogerab-lonestarland/credentials"
        },
        "metadata": {
            "href": "https://agave.iplantc.org/meta/v2/data/?q={\"associationIds\":\"0001390692364782-5056a550b8-0001-006\"}"
        },
        "roles": {
            "href": "https://agave.iplantc.org/systems/v2/rogerab-lonestarland/roles"
        },
        "self": {
            "href": "https://agave.iplantc.org/systems/v2/rogerab-lonestarland"
        }
    },
    "description": "Where I run my HPC codes.",
    "environment": null,
    "executionType": "HPC",
    "id": "rogerab-lonestarland",
    "lastModified": "2014-08-12T12:47:25.000-05:00",
    "login": {
        "auth": {
            "type": "PASSWORD"
        },
        "host": "lonestar.tacc.utexas.edu<http://lonestar.tacc.utexas.edu>",
        "port": 22,
        "protocol": "SSH",
        "proxy": null
    },
    "maxSystemJobs": 2147483647,
    "maxSystemJobsPerUser": 2147483647,
    "name": "Roger Barthelson Lonestar Account",
    "public": false,
    "queues": [
        {
            "customDirectives": " -A iPlant-Master",
            "default": false,
            "maxJobs": 400,
            "maxMemoryPerNode": 24,
            "maxNodes": 128,
            "maxProcessorsPerNode": 12,
            "maxUserJobs": -1,
            "name": "normal"
        },
        {
            "customDirectives": " -A iPlant-Master",
            "default": true,
            "maxJobs": 300,
            "maxMemoryPerNode": 999,
            "maxNodes": 1,
            "maxProcessorsPerNode": 24,
            "maxUserJobs": -1,
            "name": "largemem"
        }
    ],
    "revision": 16,
    "scheduler": "SGE",
    "scratchDir": "",
    "site": "tacc.xsede.org<http://tacc.xsede.org>",
    "startupScript": "./bashrc",
    "status": "UP",
    "storage": {
        "auth": {
            "type": "PASSWORD"
        },
        "homeDir": null,
        "host": "lonestar.tacc.utexas.edu<http://lonestar.tacc.utexas.edu>",
        "mirror": false,
        "port": 22,
        "protocol": "SFTP",
        "proxy": null,
        "rootDir": "/"
    },
    "type": "EXECUTION",
    "uuid": "0001390692364782-5056a550b8-0001-006",
    "workDir": ""
}


The relevant parts to debug the problem are:

storage.homeDir = null
storage.rootDir = “/"
scratchDir = “"
rootDir = “”

The issue is that when you submit a job, Agave will create a folder into which your inputs will be staged and a all job assets will be copied. This will become `pwd` when your job runs. On your rogerab-lonestarland system, you have storage.rootDir set to “/" and storage.homeDIr set to null, which means that your home directory and root directly, in the eyes of Agave, are the same path, “/“. There’s nothing wrong with that, per se, but your scratchDir and workDir are also both set to “”, which means Agave will try to create a temporary job directory in your home directory, “/“. Because you don’t have permission to create directories in “/“ on TACC's Lonestar cluster (which is where your system is pointing), the file staging fails. This is why the history log says that it, “...Failed to create the remote job directory rogerab/job-0001407866329350-5056a550b8-0001-007-newbler-_newbler-26__test2 on rogerab-lonestarland” in the log below.

$ jobs-history -d 0001407866329350-5056a550b8-0001-007
Job accepted and queued for submission.
Attempt 1 to stage job inputs
Identifying input files for staging
Attempt 1 failed to stage job inputs. Failed to create the remote job directory rogerab/job-0001407866329350-5056a550b8-0001-007-newbler-_newbler-26__test2 on rogerab-lonestarland
Attempt 2 to stage job inputs
Identifying input files for staging
Attempt 2 failed to stage job inputs. Failed to create the remote job directory rogerab/job-0001407866329350-5056a550b8-0001-007-newbler-_newbler-26__test2 on rogerab-lonestarland
Attempt 3 to stage job inputs
Identifying input files for staging
Attempt 3 failed to stage job inputs. Failed to create the remote job directory rogerab/job-0001407866329350-5056a550b8-0001-007-newbler-_newbler-26__test2 on rogerab-lonestarland
Cleaning up remote work directory.
Completed cleaning up remote work directory.
Unable to stage inputs for job after 3 attempts. Job cancelled.

To fix this problem, either set your systems’s storage.homeDir to your actual home directory, or set scratchDir and/or workDir to folders where you have write access. In your case, they should be, /scratch/01685/rogerab and /work/01685/rogerab, respectively.

I hope this helps. Let me know if you need help updating the different paths. I hate to see you stuck for so long on stuff like this.

--
Rion



On Aug 12, 2014, at 2:43 PM, Barthelson, Roger A - (rogerab) <rogerab at email.arizona.edu<mailto:rogerab at email.arizona.edu>> wrote:

I keep running into the same problem when I try to run a job with Agave.  I am told that the inputs could not be staged — specifically that a job directory could not be created on my system, e.g.:

Attempt 3 failed to stage job inputs. Failed to create the remote job directory rogerab/job-0001407866329350-5056a550b8-0001-007-newbler-_newbler-26__test2 on rogerab-lonestarland

I’m not sure why this should be the case. I defined a scratch drive for my system, the login is correct.
In any case, the result is that the job fails. It fails whether I try to run it via json file and CLI, or in the DE.

That is my current blocking point.

Roger

--
Roger Barthelson Ph.D.
Scientific Analyst
iPlant Collaborative
BIO5 Institute, University of Arizona
Phone: 520-977-5249
Email: rogerab at email.arizona.edu<mailto:rogerab at email.arizona.edu>
Web: www.iplantcollaborative.org/<http://www.iplantcollaborative.org/>
_______________________________________________
Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org<mailto:Iplant-api-dev at iplantcollaborative.org>
List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/dooley%40tacc.utexas.edu?unsub=1&unsubconfirm=1

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.iplantcollaborative.org/pipermail/iplant-api-dev/attachments/20140812/16d64f9d/attachment-0001.html 


More information about the Iplant-api-dev mailing list