[Pvfs2-developers] help with kernel dentry revalidate problem

Sam Lang slang at mcs.anl.gov
Mon Oct 8 22:42:12 EDT 2007


On Oct 8, 2007, at 7:48 PM, Murali Vilayannur wrote:

> Hi Sam,
>> I was able to verify that your latest patch fixes the problem with
>> the simul test #7, so I went ahead and committed it.
>
> Awesome! I thought Kevin mentioned that it crashed somewhere else.. :(
>
>>
>> Also, when the problem actually existed, running simul #7 a bunch and
>> then trying to unload the kernel module was giving an error:
>
> Ah.. Maybe this was the error he mentioned to me..?
>
>>
>> [  806.396608] slab error in kmem_cache_destroy(): cache
>> `pvfs2_op_cache': Can't free all objects
>
>> I did some debugging and it looks like the op cache entry that wasn't
>> getting release was from a lookup, and it looks like there's a case
>> where lookup can return an error that's not ENOENT, and the op entry
>> doesn't get released.  I've attached a patch that I think fixes the
>> problem.  Can you verify that this looks ok?
>
> Awesome! Nice catch! That looks great!
>
>>  Also, I've seen this
>> error before on other systems (I think Pete has too) and I'm not sure
>> its always from this one case.  Is there a good way to verify that
>> we're releasing ops (and possibly other cache entries)
>> appropriately?  Just looking for ideas to harden the code in the  
>> kmod.
>
> I guess we could always keep track of extra list_head's in each  
> object being
> allocated off the slab and chain it in a private global link list and
> verify that the list
> is empty prior to module unload's kmem_cache_destroy() perhaps?
> If not empty then free all the remaining elements..

Yeah and write some big warnings to the log about leaking cache  
entries.  Are there any interfaces to the kmem_cache that allow you  
to query the entries you've allocated so that we don't have to do  
that ourselves?

> To find out where we are leaking we could store the return address  
> of the caller
> of the alloc() function in the object and use that to find out the
> offending leaks..
> What do you think?

Sounds good to me.
-sam
> Thanks!
> Murali
>
>>
>> -sam
>>
>>
>>
>> On Aug 16, 2007, at 12:59 PM, Murali Vilayannur wrote:
>>
>>> Kevin,
>>> Instead of the call to d_add(), can you replace it by a
>>> pvfs2_d_splice_alias() with the same parameters as before and
>>> recompile/reload and see if that fixes the crash.
>>> Something like the attached..
>>> thanks,
>>> Murali
>>>
>>> On 8/16/07, Kevin Harms <harms at alcf.anl.gov> wrote:
>>>> Murali,
>>>>
>>>>         i tried the patch. (applied it to 2.6.3 source) it get
>>>> crashes from
>>>> on one of the machines.
>>>>         i send you an email with dmesg output.
>>>>
>>>> Kevin
>>>>
>>>> <dcache.patch>
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> Pvfs2-developers at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>>
>>
>>
>



More information about the Pvfs2-developers mailing list