[Pvfs2-users] unexplainable data corruption
Emmanuel Florac
eflorac at intellique.com
Fri May 30 11:20:37 EDT 2008
Le Fri, 30 May 2008 09:59:25 -0500
Troy Benjegerdes <troy at scl.ameslab.gov> écrivait:
> So a bad network card (or maybe pci-X slot) is causing corruption
> that still has a correct TCP checksum?
>
Obviously! And there are more errors with one port than the other one.
On one port I had one error every ~8 GB transferred, on the other one
much more. The very bad news were that it was a completely silent
error... I have 15TB of possibly corrupted data I'll have to recreate
entirely :(
BTW I've plugged the card on different PCI slots and it behaved the
same. So I'm pretty sure it's the card.
> Can you run tcpdump on both server and client and save a trace?
Well I'll have to put back the faulty card in another machine. It may
be interesting to check what's happening with ssh, nfs, etc.
> You could also try turning off any TCP or checksum offload in the
> network card.
Yes, the Intel Pro1000 has a TOE, it may be faulty. I don't know how to
disable it unfortunately.
--
----------------------------------------
Emmanuel Florac | Intellique
----------------------------------------
More information about the Pvfs2-users
mailing list