[Pvfs2-developers] help with kernel dentry revalidate problem
Sam Lang
slang at mcs.anl.gov
Mon Oct 8 22:42:12 EDT 2007
On Oct 8, 2007, at 7:48 PM, Murali Vilayannur wrote:
> Hi Sam,
>> I was able to verify that your latest patch fixes the problem with
>> the simul test #7, so I went ahead and committed it.
>
> Awesome! I thought Kevin mentioned that it crashed somewhere else.. :(
>
>>
>> Also, when the problem actually existed, running simul #7 a bunch and
>> then trying to unload the kernel module was giving an error:
>
> Ah.. Maybe this was the error he mentioned to me..?
>
>>
>> [ 806.396608] slab error in kmem_cache_destroy(): cache
>> `pvfs2_op_cache': Can't free all objects
>
>> I did some debugging and it looks like the op cache entry that wasn't
>> getting release was from a lookup, and it looks like there's a case
>> where lookup can return an error that's not ENOENT, and the op entry
>> doesn't get released. I've attached a patch that I think fixes the
>> problem. Can you verify that this looks ok?
>
> Awesome! Nice catch! That looks great!
>
>> Also, I've seen this
>> error before on other systems (I think Pete has too) and I'm not sure
>> its always from this one case. Is there a good way to verify that
>> we're releasing ops (and possibly other cache entries)
>> appropriately? Just looking for ideas to harden the code in the
>> kmod.
>
> I guess we could always keep track of extra list_head's in each
> object being
> allocated off the slab and chain it in a private global link list and
> verify that the list
> is empty prior to module unload's kmem_cache_destroy() perhaps?
> If not empty then free all the remaining elements..
Yeah and write some big warnings to the log about leaking cache
entries. Are there any interfaces to the kmem_cache that allow you
to query the entries you've allocated so that we don't have to do
that ourselves?
> To find out where we are leaking we could store the return address
> of the caller
> of the alloc() function in the object and use that to find out the
> offending leaks..
> What do you think?
Sounds good to me.
-sam
> Thanks!
> Murali
>
>>
>> -sam
>>
>>
>>
>> On Aug 16, 2007, at 12:59 PM, Murali Vilayannur wrote:
>>
>>> Kevin,
>>> Instead of the call to d_add(), can you replace it by a
>>> pvfs2_d_splice_alias() with the same parameters as before and
>>> recompile/reload and see if that fixes the crash.
>>> Something like the attached..
>>> thanks,
>>> Murali
>>>
>>> On 8/16/07, Kevin Harms <harms at alcf.anl.gov> wrote:
>>>> Murali,
>>>>
>>>> i tried the patch. (applied it to 2.6.3 source) it get
>>>> crashes from
>>>> on one of the machines.
>>>> i send you an email with dmesg output.
>>>>
>>>> Kevin
>>>>
>>>> <dcache.patch>
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> Pvfs2-developers at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>>
>>
>>
>
More information about the Pvfs2-developers
mailing list