[Pvfs2-developers] patch: error code bug fixes
Phil Carns
pcarns at wastedcycles.org
Tue Mar 20 09:22:09 EST 2007
This patch corrects a variety of error code problems:
- several BMI error codes were not tagged with the BMI error class,
which is important to allow client state machines to retry on network errors
- ditto above for a few flow errors
- ECONNRESET was not understood by BMI or included in the error code mapping
- error codes were printed in hex by some routines
- some kernel operation timeouts were translated as EINTR, which is
misleading (these were changed to ETIMEDOUT)
- timeouts while waiting for a kernel buffer were reported as -1 (EPERM)
rather than ETIMEDOUT
I also noticed quite a bit of fragility in how sockio.c handles error
codes, but I didn't address that in this patch other than to work around
one common case. Basically, the issue is that sockio.c still sets errno
and returns -1 when it has a problem (most of the PVFS2 source code APIs
return -PVFS_ERRORs). bmi_tcp.c then translates it. This is fragile
for a few reasons, but one that really stands out now is that only
sockio.c knows how to translate h_errno values. That means that now
some of the sockio functions do translate error codes immediately, while
others defer to let bmi_tcp.c do the work. This is confusing :)
The BMI error handling and ECONNRESET parts of this patch are important
for failover scenarios so that clients are able to pick back up without
error.
The ones related to kernel timeouts are important if you are tuning
kernel buffer sizes- at some point if your buffers are large enough you
may end up with processes waiting for buffers long enough to exhaust the
default timeouts.
-Phil
-------------- next part --------------
A non-text attachment was scrubbed...
Name: error-codes.patch
Type: text/x-patch
Size: 14304 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20070320/0e984cef/error-codes.bin
More information about the Pvfs2-developers
mailing list