[Pvfs2-developers] Re: openib-vfs failure
Kyle Schochenmaier
kschoche at scl.ameslab.gov
Fri Feb 1 20:07:31 EST 2008
Troy Benjegerdes wrote:
>
>> Pete -
>>
>> I've attached a link to a log of the failure with network debugging
>> on in
>> the client, single IO node. The whole log is 5.9GB so I only
>> attached the
>> last 10k lines. Same error as before of course.
>>
>> http://www.scl.ameslab.gov/~kschoche/pvfs2-client.log.gz
>>
>> The mopids are fairly difficult to track as they are used all over the
>> place and end up here and there, I cant make out anything useful from it
>> :'(
>>
>> Any advice would be great,
>>
>> ~Kyle
>>
>>
> Here is another, full logfile of another failure: (93M compressed, 1GB
> unpacked)
>
> http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client.log.gz
Kind of an update....
after doing some tracing of function calls and trying to figure out why
the same "mop_id" was used 10,000+ times during my failed run, troy and
I stumbled upon some of the fmr code.. and after changing the
id_gen_fast(mopid) functions to use the id_gen_safe(mopid) functions in
id_generator.c... We have possibly fixed the problem, however, this
does introduce some amount of overhead. I'll attempt to do some tests
to quantify the exact amount next week, but for now it seems to at least
allow my tests to complete.
Maybe something is wrong with the id_gen_fast() stuff, locking or other
issues maybe?
Troy and I had some questions about how these mop_id's, which are just
addresses, are generated, and whether or not there is the possibility
for two I/O servers to generate the same address, and send that to the
client somehow?
Can you give us a brief description of the process Pete?
Thanks,
Kyle
--
Kyle Schochenmaier
kschoche at scl.ameslab.gov
Research Assistant, Dr. Brett Bode
AmesLab - US Dept.Energy
Scalable Computing Laboratory
More information about the Pvfs2-developers
mailing list