[Iplant-api-dev] MPI problems

Barthelson, Roger A - (rogerab) rogerab at email.arizona.edu
Thu Feb 21 10:47:53 MST 2013


Hi-

I am having difficulty with setting up a bash script to run GeneSeqer on Stampede. At least it's not predictable for me. I had to redo significantly the GeneSeqer wrapper that is supposed to run GeneSeqer (through the API) on Stampede. This is supposed to be used by Plant GDB. The MPI version of GeneSeqer is not that fast and limited to ideally about 8 threads on Stampede. Volker Brendel, the developer, uses a combination of splitting the mRNA files and the genome files and running them in separate instances to get more throughput out of the application. I've been trying to use the recommended syntax for ibrun for using the host list to manage different running instances. It suggests this:

ibrun -n 16 -o 0  ./mpihello &
ibrun -n 16 -o 16 ./mpihello &

I've tried this sort of arrangement within my wrapper, but it doesn't seem to work. It runs first the top line, then the next. I really want to do this version of it:

ibrun -n 8 -o 0 GeneSeqerMPI &
ibrun -n 8 -o 8 GeneSeqerMPI &

But this doesn't work either, of course. It runs the two steps in series.

What I found recently was that if I cut out the preceding steps of the wrapper and start the script with just the GeneSeqer steps, they run together in parallel as I expected-- except that it won't run as a a batch script. It fails immediately as a batch script. If I start an interactive job and then start the script as a bash command, that's when it runs the two steps in parallel.  But if I add some of the preceding steps and instead of writing out the command literally, put in the definitions of the variables and the variables, it runs as a batch script, but only in series again.

So my best guess is that the ibrun commands are affected by the environment and may be interacting with the SLURM system. In any case, I'm not sure what to do with this. I've wasted a lot of time. For most of this (except the first example with two 16 processor instances), I have run the batch scripts with the following settings:
#SBATCH -J geneseqer            # Job name
#SBATCH -o gsq.%j.out       # Name of stdout output file (%j expands to jobId)
#SBATCH -p development        # Queue name
#SBATCH -N 1                  # Total number of nodes requested (16 cores/node)
#SBATCH -n 16                 # Total number of mpi tasks requested
#SBATCH -t 04:00:00           # Run time (hh:mm:ss)

Any ideas?

Roger


Roger Barthelson Ph.D.
Bioinformatics Analyst
iPlant Collaborative
BIO5 Institute, University of Arizona
Phone: 520-977-5249
Email: rogerab at email.arizona.edu
Web: http://www.iplantcollaborative.org/
           http://bio-it.arizona.edu/Roger_Barthelson_Home.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.iplantcollaborative.org/pipermail/iplant-api-dev/attachments/20130221/dc664ddb/attachment.html 


More information about the Iplant-api-dev mailing list