[Pvfs2-users] I/O error
Robert Latham
robl at mcs.anl.gov
Wed Feb 21 23:01:15 EST 2007
On Sat, Feb 17, 2007 at 02:32:13PM +0100, Michael Kuhn wrote:
> The program basically writes and reads data using combinations of
> (non-)collective and (non-)contiguous I/O. It seems this error only
> occurs if we do non-collective, contiguous I/O (level 0) with multiple
> processes. Less processes and other levels work just fine. (The number
> of iterations our program does also seems to play a role; 2 iterations
> work, 3 produce the error.)
With the test case this was easy to track down (thanks again!), but
it's proving harder to come up with a solid fix.
The problem is that I do not correctly handle incrementing the
independent file pointer in our PVFS driver with the kinds of types
you are passing in and multiple calls to MPI-IO independent file
pointer routines.
Because we messed up type handling, data would be corrupted on the
second call to MPI_File_write, and as Pete diagnosed, we'd start
running off the end of our typmap array at the third iteration.
In the short term, if you can do all your I/O in a sigle call to
MPI_File_write, you'll avoid this bug. In the longer term, I'll send
you a patch, but I'm going to be on travel thursday and friday. I
hope I can put something together early next week.
==rob
--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B
More information about the Pvfs2-users
mailing list