[Pvfs2-developers] Re: openib-vfs failure

Kyle Schochenmaier kschoche at scl.ameslab.gov
Fri Feb 1 20:07:31 EST 2008


Troy Benjegerdes wrote:
>
>> Pete -
>>
>> I've attached a link to a log of the failure with network debugging 
>> on in
>> the client, single IO node.  The whole log is 5.9GB so I only 
>> attached the
>> last 10k lines.  Same error as before of course.
>>
>> http://www.scl.ameslab.gov/~kschoche/pvfs2-client.log.gz
>>
>> The mopids are fairly difficult to track as they are used all over the
>> place and end up here and there, I cant make out anything useful from it
>> :'(
>>
>> Any advice would be great,
>>
>> ~Kyle
>>
>>   
> Here is another, full logfile of another failure: (93M compressed, 1GB 
> unpacked)
>
> http://www.scl.ameslab.gov/~troy/pvfs/pvfs2-client.log.gz

Kind of an update....
after doing some tracing of function calls and trying to figure out why 
the same "mop_id" was used 10,000+ times during my failed run, troy and 
I stumbled upon some of the fmr code.. and after changing the 
id_gen_fast(mopid) functions to use the id_gen_safe(mopid)  functions in 
id_generator.c...  We have possibly fixed the problem, however, this 
does introduce some amount of overhead.  I'll attempt to do some tests 
to quantify the exact amount next week, but for now it seems to at least 
allow my tests to complete.

Maybe something is wrong with the id_gen_fast() stuff, locking or other 
issues maybe?

Troy and I had some questions about how these mop_id's, which are just 
addresses, are generated, and whether or not there is the possibility 
for two I/O servers to generate the same address, and send that to the 
client somehow?
Can you give us a brief description of the process Pete?

Thanks,
Kyle

-- 
Kyle Schochenmaier
kschoche at scl.ameslab.gov
Research Assistant, Dr. Brett Bode
AmesLab - US Dept.Energy
Scalable Computing Laboratory 



More information about the Pvfs2-developers mailing list