[Pvfs2-developers] Unexpected flow protocol error using unequally
distribution of data with MPI
Sam Lang
slang at mcs.anl.gov
Tue Mar 13 13:22:46 EST 2007
Hi Julian,
The patch at the following link should fix the assert failure you
were seeing with alt-aio:
http://www.pvfs.org/fisheye/rdiff/PVFS?csid=MAIN:slang:
20070313175623&u&N
The problem was that alt-aio was calling the notification callback
for each segment in the aiocb list. The AIO spec requires that all
the requests in the list complete before calling the notification
callback. So the last thread created now waits for the others to
finish before calling the callback.
With this patch (and two servers), I'm able to run your test to
completion with alt-aio. Incidentally, when I used the default
TroveMethod (normal aio), I wasn't able to reproduce the IO errors
you were seeing.
If this patch helps to get your test working for you with alt-aio, it
might point to a bad aio implementation on your servers? Otherwise
I'm not sure...maybe I can get an account on your machines to debug
that one.
I've also committed your test to CVS at client/mpi-io/mpi-unbalanced-
test. If you want to generalize it to work with different unbalanced
distributions, that would be great.
Thanks,
-sam
On Mar 12, 2007, at 11:14 AM, Julian Martin Kunkel wrote:
> Hi,
> here comes the output, I guess it could be a problem with an
> overflow of int
> values.. I just send the interesting (last) parts of the output,
> because the
> two files have a total of 50 MByte :-)
>
> Best regards,
> Julian
> <node1>
> <node2>
More information about the Pvfs2-developers
mailing list