[Iplant-api-dev] agave, etc.

Barthelson, Roger A - (rogerab) rogerab at email.arizona.edu
Tue Aug 12 14:22:13 MST 2014


Okay. I found a combination that finally let it stage the inputs and put the job in the queue — This is my system:

{
"id": "rogerab-lonestarland",
"name": "Roger Barthelson Lonestar Account",
"status": "UP",
"type": "EXECUTION",
"description": "Where I run my HPC codes.",
"site": "tacc.xsede.org",
"executionType": "HPC",
"default": true,
"queues": [
{
"name": "normal",
"maxJobs": 400,
"maxNodes": 128,
"maxProcessorsPerNode": 12,
"maxRequestedTime": "24:00:00",
"maxMemoryPerNode": "24GB",
"customDirectives": " -A iPlant-Master",
"default": true
},
{
"name": "largemem",
"maxJobs": 300,
"maxNodes": 1,
"maxProcessorsPerNode": 24,
"maxRequestedTime": "24:00:00",
"maxMemoryPerNode": "999GB",
"customDirectives": " -A iPlant-Master",
"default": true
}
],
"login": {
"host": "lonestar.tacc.utexas.edu",
"port": 22,
"protocol": "SSH",
"rootDir": "/",
"scratchDir": "/scratch/01685/rogerab/rogerab",
"homeDir": "/scratch/01685/rogerab",
"auth": {
"username": "rogerab",
"password": "PASSWORD",
"type": "PASSWORD",
"default": true
}
},
"storage": {
"host": "lonestar.tacc.utexas.edu",
"port": 22,
"protocol": "SFTP",
"rootDir": "/",
"scratchDir": "/scratch/01685/rogerab/rogerab",
"homeDir": "/scratch/01685/rogerab",
"auth": {
"username": "rogerab",
"password": "PASSWORD",
"type": "PASSWORD"
}
},
"scheduler": "SGE",
"environment": "",
"startupScript": "./bashrc"
}

This may not be the best example, but it works for me.

Roger
--
Roger Barthelson Ph.D.
Scientific Analyst
iPlant Collaborative
BIO5 Institute, University of Arizona
Phone: 520-977-5249
Email: rogerab at email.arizona.edu<mailto:rogerab at email.arizona.edu>
Web: www.iplantcollaborative.org/<http://www.iplantcollaborative.org/>


On August 12, 2014 at 1:29:33 PM, Rion Dooley (dooley at tacc.utexas.edu<mailto:dooley at tacc.utexas.edu>) wrote:

Either scratchDir or workDir are sufficient. You don’t really need them both.

To be clear, those should be defined at the top level of your system description, not within the storage object.

--
Rion




On Aug 12, 2014, at 3:19 PM, Barthelson, Roger A - (rogerab) <rogerab at email.arizona.edu<mailto:rogerab at email.arizona.edu>> wrote:

Hi Rion-
Thanks for the input. What you say is helpful, but I don’t know if that is a current view of my system (I keep changing them), but I do have a scratch directory defined. Or I tried to anyway. I previously had a work and home directory defined, and I was getting the same error, so I thought I might be able to avoid it doing what you just said it was doing ( trying to use the home directory) by not listing one. But I still had a non-empty scratch directory defined:

{
"id": "rogerab-lonestarland",
"name": "Roger Barthelson Lonestar Account",
"status": "UP",
"type": "EXECUTION",
"description": "Where I run my HPC codes.",
"site": "tacc.xsede.org<http://tacc.xsede.org/>",
"executionType": "HPC",
"default": true,
"queues": [
{
"name": "normal",
"maxJobs": 400,
"maxNodes": 128,
"maxProcessorsPerNode": 12,
"maxRequestedTime": "24:00:00",
"maxMemoryPerNode": "24GB",
"customDirectives": " -A iPlant-Master",
"default": true
},
{
"name": "largemem",
"maxJobs": 300,
"maxNodes": 1,
"maxProcessorsPerNode": 24,
"maxRequestedTime": "24:00:00",
"maxMemoryPerNode": "999GB",
"customDirectives": " -A iPlant-Master",
"default": true
}
],
"login": {
"host": "lonestar.tacc.utexas.edu<http://lonestar.tacc.utexas.edu/>",
"port": 22,
"protocol": "SSH",
"scratchDir": "/scratch/01685/rogerab",
"auth": {
"username": "rogerab",
"password": “PASSWORD",
"type": "PASSWORD",
"default": true
}
},
"storage": {
"host": "lonestar.tacc.utexas.edu<http://lonestar.tacc.utexas.edu/>",
"port": 22,
"protocol": "SFTP",
"rootDir": "/",
"scratchDir": "/scratch/01685/rogerab",
"auth": {
"username": "rogerab",
"password": “PASSWORD",
"type": "PASSWORD"
}
},
"scheduler": "SGE",
"environment": "",
"startupScript": "./bashrc"
}

I just hoped it would use the only directory defined, but I guess that didn’t work. But if this is what is registered in Agave for my rogerab-lonestarland, it is not correct!

Roger
--
Roger Barthelson Ph.D.
Scientific Analyst
iPlant Collaborative
BIO5 Institute, University of Arizona
Phone: 520-977-5249
Email: rogerab at email.arizona.edu<mailto:rogerab at email.arizona.edu>
Web: www.iplantcollaborative.org/<http://www.iplantcollaborative.org/>


On August 12, 2014 at 1:04:34 PM, Rion Dooley (dooley at tacc.utexas.edu<mailto:dooley at tacc.utexas.edu>) wrote:

Hi Roger,

The issue is that the paths you’re trying to use don’t work. Here is your system description:

$ systems-list -v rogerab-lonestarland
{
    "_links": {
        "credentials": {
            "href": "https://agave.iplantc.org/systems/v2/rogerab-lonestarland/credentials"
        },
        "metadata": {
            "href": "https://agave.iplantc.org/meta/v2/data/?q={\"associationIds\":\"0001390692364782-5056a550b8-0001-006\"}"
        },
        "roles": {
            "href": "https://agave.iplantc.org/systems/v2/rogerab-lonestarland/roles"
        },
        "self": {
            "href": "https://agave.iplantc.org/systems/v2/rogerab-lonestarland"
        }
    },
    "description": "Where I run my HPC codes.",
    "environment": null,
    "executionType": "HPC",
    "id": "rogerab-lonestarland",
    "lastModified": "2014-08-12T12:47:25.000-05:00",
    "login": {
        "auth": {
            "type": "PASSWORD"
        },
        "host": "lonestar.tacc.utexas.edu<http://lonestar.tacc.utexas.edu/>",
        "port": 22,
        "protocol": "SSH",
        "proxy": null
    },
    "maxSystemJobs": 2147483647,
    "maxSystemJobsPerUser": 2147483647,
    "name": "Roger Barthelson Lonestar Account",
    "public": false,
    "queues": [
        {
            "customDirectives": " -A iPlant-Master",
            "default": false,
            "maxJobs": 400,
            "maxMemoryPerNode": 24,
            "maxNodes": 128,
            "maxProcessorsPerNode": 12,
            "maxUserJobs": -1,
            "name": "normal"
        },
        {
            "customDirectives": " -A iPlant-Master",
            "default": true,
            "maxJobs": 300,
            "maxMemoryPerNode": 999,
            "maxNodes": 1,
            "maxProcessorsPerNode": 24,
            "maxUserJobs": -1,
            "name": "largemem"
        }
    ],
    "revision": 16,
    "scheduler": "SGE",
    "scratchDir": "",
    "site": "tacc.xsede.org<http://tacc.xsede.org/>",
    "startupScript": "./bashrc",
    "status": "UP",
    "storage": {
        "auth": {
            "type": "PASSWORD"
        },
        "homeDir": null,
        "host": "lonestar.tacc.utexas.edu<http://lonestar.tacc.utexas.edu/>",
        "mirror": false,
        "port": 22,
        "protocol": "SFTP",
        "proxy": null,
        "rootDir": "/"
    },
    "type": "EXECUTION",
    "uuid": "0001390692364782-5056a550b8-0001-006",
    "workDir": ""
}


The relevant parts to debug the problem are:

storage.homeDir = null
storage.rootDir = “/"
scratchDir = “"
rootDir = “”

The issue is that when you submit a job, Agave will create a folder into which your inputs will be staged and a all job assets will be copied. This will become `pwd` when your job runs. On your rogerab-lonestarland system, you have storage.rootDir set to “/" and storage.homeDIr set to null, which means that your home directory and root directly, in the eyes of Agave, are the same path, “/“. There’s nothing wrong with that, per se, but your scratchDir and workDir are also both set to “”, which means Agave will try to create a temporary job directory in your home directory, “/“. Because you don’t have permission to create directories in “/“ on TACC's Lonestar cluster (which is where your system is pointing), the file staging fails. This is why the history log says that it, “...Failed to create the remote job directory rogerab/job-0001407866329350-5056a550b8-0001-007-newbler-_newbler-26__test2 on rogerab-lonestarland” in the log below.

$ jobs-history -d 0001407866329350-5056a550b8-0001-007
Job accepted and queued for submission.
Attempt 1 to stage job inputs
Identifying input files for staging
Attempt 1 failed to stage job inputs. Failed to create the remote job directory rogerab/job-0001407866329350-5056a550b8-0001-007-newbler-_newbler-26__test2 on rogerab-lonestarland
Attempt 2 to stage job inputs
Identifying input files for staging
Attempt 2 failed to stage job inputs. Failed to create the remote job directory rogerab/job-0001407866329350-5056a550b8-0001-007-newbler-_newbler-26__test2 on rogerab-lonestarland
Attempt 3 to stage job inputs
Identifying input files for staging
Attempt 3 failed to stage job inputs. Failed to create the remote job directory rogerab/job-0001407866329350-5056a550b8-0001-007-newbler-_newbler-26__test2 on rogerab-lonestarland
Cleaning up remote work directory.
Completed cleaning up remote work directory.
Unable to stage inputs for job after 3 attempts. Job cancelled.

To fix this problem, either set your systems’s storage.homeDir to your actual home directory, or set scratchDir and/or workDir to folders where you have write access. In your case, they should be, /scratch/01685/rogerab and /work/01685/rogerab, respectively.

I hope this helps. Let me know if you need help updating the different paths. I hate to see you stuck for so long on stuff like this.

--
Rion



On Aug 12, 2014, at 2:43 PM, Barthelson, Roger A - (rogerab) <rogerab at email.arizona.edu<mailto:rogerab at email.arizona.edu>> wrote:

I keep running into the same problem when I try to run a job with Agave.  I am told that the inputs could not be staged — specifically that a job directory could not be created on my system, e.g.:

Attempt 3 failed to stage job inputs. Failed to create the remote job directory rogerab/job-0001407866329350-5056a550b8-0001-007-newbler-_newbler-26__test2 on rogerab-lonestarland

I’m not sure why this should be the case. I defined a scratch drive for my system, the login is correct.
In any case, the result is that the job fails. It fails whether I try to run it via json file and CLI, or in the DE.

That is my current blocking point.

Roger

--
Roger Barthelson Ph.D.
Scientific Analyst
iPlant Collaborative
BIO5 Institute, University of Arizona
Phone: 520-977-5249
Email: rogerab at email.arizona.edu<mailto:rogerab at email.arizona.edu>
Web: www.iplantcollaborative.org/<http://www.iplantcollaborative.org/>
_______________________________________________
Iplant-api-dev Mailing List: Iplant-api-dev at iplantcollaborative.org<mailto:Iplant-api-dev at iplantcollaborative.org>
List Info and Archives: http://mail.iplantcollaborative.org/mailman/listinfo/iplant-api-dev
One-click Unsubscribe: http://mail.iplantcollaborative.org/mailman/options/iplant-api-dev/dooley%40tacc.utexas.edu?unsub=1&unsubconfirm=1

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.iplantcollaborative.org/pipermail/iplant-api-dev/attachments/20140812/cef68d01/attachment-0001.html 


More information about the Iplant-api-dev mailing list