[Pvfs2-developers] Re: mpi-io tests
Rob Ross
rross at mcs.anl.gov
Wed Sep 20 04:10:47 EDT 2006
Thanks for figuring this out Sam! -- Rob
Sam Lang wrote:
>
> Hi Guys,
>
> I finally found the bug(s) causing these hangs. The first problem was
> in the request scheduler, with the new crdirent pass-through changes.
> The handle the crdirent gets queued on is actually the directory handle
> instead of the dirent handle, so the operation on that handle is
> technically read-only. Treating it as modifying was causing other
> operations that came along (setattr, for example) to get queued instead
> of scheduled.
>
> That was the first cause of hangs. The second was in the way the sync
> coalescing code worked. There were cases where operations were getting
> queued as ready-to-be-synced (coalesced), but the following operations
> that got serviced were failing (appropriately with EEXISTS), and never
> calling any of the coalescing code. Julian and I had talked about this
> being a problem a while back, but I guess it never got looked at. In
> any case, I was able to cleanup the sync coalescing code some, so it was
> probably worth it.
>
> The tests of doing multiple simultaneous creates and unlinks to the same
> file seem to work fine now, including the open test in test/posix. Let
> me know if any of you still have problems.
>
> Thanks,
>
> -sam
>
> Murali Vilayannur wrote:
>> Hi RobL,
>>> I'm seeing this on chiba with posix ior and the flash io benchmark
>>> (oddly enough, just with the parallel netcdf version, not the hdf5
>>> one).
>>>
>>> I agree it's something related to chiba, but have no idea what it
>>> could be. I tried a different version of berkely db and saw the same
>>> results.
>>>
>>> pvfs2-ping works, but pvfs2-ls hangs in getattr. servers don't
>>> *appear* to be stuck in anything, but I only hooked up a debugger to
>>> the server-that-wouldn't-die.
>>
>> I really wish we could rule this as a Chiba-specific bug, since it is so
>> hard to reproduce elsewhere or we would have known by now! :)
>> Since, I was using the vfs interface and not the mpi-io interface,
>> I can only think of the simultaneous create and unlink being the issue
>> here.
>> The other possibility is a bug in the aio libraries on Chiba...
>> If any one has any insights or seen similar behavior, do let us know!
>> thanks,
>> Murali
>>
>>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
More information about the Pvfs2-developers
mailing list