[Pvfs2-developers] Unexpected flow protocol error using unequally distribution of data with MPI

Sam Lang slang at mcs.anl.gov
Tue Mar 13 13:22:46 EST 2007


Hi Julian,

The patch at the following link should fix the assert failure you  
were seeing with alt-aio:

http://www.pvfs.org/fisheye/rdiff/PVFS?csid=MAIN:slang: 
20070313175623&u&N

The problem was that alt-aio was calling the notification callback  
for each segment in the aiocb list.  The AIO spec requires that all  
the requests in the list complete before calling the notification  
callback.  So the last thread created now waits for the others to  
finish before calling the callback.

With this patch (and two servers), I'm able to run your test to  
completion with alt-aio.  Incidentally, when I used the default  
TroveMethod (normal aio), I wasn't able to reproduce the IO errors  
you were seeing.

If this patch helps to get your test working for you with alt-aio, it  
might point to a bad aio implementation on your servers?  Otherwise  
I'm not sure...maybe I can get an account on your machines to debug  
that one.

I've also committed your test to CVS at client/mpi-io/mpi-unbalanced- 
test.  If you want to generalize it to work with different unbalanced  
distributions, that would be great.

Thanks,

-sam

On Mar 12, 2007, at 11:14 AM, Julian Martin Kunkel wrote:

> Hi,
> here comes the output, I guess it could be a problem with an  
> overflow of int
> values.. I just send the interesting (last) parts of the output,  
> because the
> two files have a total of 50 MByte :-)
>
> Best regards,
> Julian
> <node1>
> <node2>



More information about the Pvfs2-developers mailing list