[Pvfs2-developers] server crash on startup with millions of files

Phil Carns pcarns at wastedcycles.org
Tue Feb 20 11:58:26 EST 2007


Hi Sam,

Thanks for the suggestions and for the insight on the error codes.

I think tomorrow I'll try to replicate the problem we saw in a simpler 
single server environment (the file system we saw this on is busy now). 
   That might make it easier to step through your suggestions, starting 
with just upgrading to a newer version.  I didn't realize that those 
error code changes might have an impact here.

-Phil

Sam Lang wrote:
> 
> On Feb 20, 2007, at 6:29 AM, Phil Carns wrote:
> 
>> Hi guys,
>>
>> We have run into a problem recently with a configuration that looks  
>> like this:
>>
>> - x86_64 architecture
>> - 16 servers
>> - SAN based storage
>> - approximately 1.4 million files on PVFS
>>
>> Everything works fine, except when we stop and then later restart  one 
>> of the pvfs2-server daemons.  At least one of them usually (but  not 
>> quite always) crashes before the file system is ready to be  mounted.
>>
>> We captured a core file and can see that it died on this assertion  in 
>> the dbpf_dspace_test() function:
>>
>> dbpf-dspace.c:1371
>> assert(!dbpf_op_queue_empty(dbpf_completion_queue_array[context_id]));
>>
>> According to the stack trace, this test() call followed a  
>> trove_dspace_iterate_handles() call within the  
>> trove_check_handle_ranges() function.  This is part of the logic on  
>> startup that scans all of the handles in the storage space to  update 
>> the list of available/used handles in trove-handle-mgmt.
>>
>> We found that we can completely work around the problem by manually  
>> setting the coll_p->immediate_completion flag during the  
>> trove_check_handle_ranges() function. That forces the  
>> iterate_handles() function to do all of its processing up front  
>> without using a test function.  There is just some sort of bad  
>> interaction when the two functions are used together.
>>
>> As a side note, setting the "ImmediateCompletion" config file  option 
>> does not work around the problem, because that flag does not  take 
>> effect until after this assertion occurs.  The set_info calls  in 
>> pvfs2-server just happen to be in the wrong order.  We would  probably 
>> not have used this approach anyway, because we haven't  fully tested 
>> the performance impact of enabling immediate  completion for everything.
>>
>> Anyone have any suggestions about what the real problem is here?   
>> While the workaround is fine to keep us running for now, it seems  
>> like there is an underlying issue to be addressed.
> 
> 
> Hi Phil,
> 
> It looks like the completion queue is empty but the state is set to  
> OP_COMPLETED, which we assert shouldn't ever happen.  In the dbpf  
> thread function, we essentially add anything to the completion queue  
> thats either DBPF_OP_COMPLETE (1) or an error (which we assume to be  
> negative).  We leave 0 (DBPF_OP_CONTINUE) and non-negative values for  
> operations that need to be re-queued.  There's a special case I've  seen 
> before though, where a DB call returns an error that the  
> dbpf_db_error_to_trove_error function doesn't recognize as a DB error  
> to translate and so returns -4243, but in the dspace code (including  
> iterate_handles), we do:
> 
> ret = -dbpf_db_error_to_trove_error(db_ret);
> 
> so ret ends up being positive.  I've tried to fix this in a recent  
> version of the 2.6 branch and head, by checking that the error isn't  
> -4243 or 4243 in the thread code, but I think for older versions the  op 
> gets added back to the queue or just ends up in la-la land.
> 
> In any case, it _might_ help to upgrade to the latest HEAD or 2.6  
> branch if possible.  Also, you could test my theory by adding an  
> assertion for anything that isn't DBPF_OP_COMPLETE in the dbpf- 
> thread.c:dbpf_do_one_work_cycle function.
> 
> If my theory is correct, then the next question is why db is  returning 
> an error that trove doesn't understand?  Did you upgrade  berkeley db?  
> What's the actual error and why is iterate_handles  causing it?
> 
> If this isn't the problem, it would be helpful to know what the  return 
> value is from iterate_handles_op_svc.
> 
> The changes I made to dbpf-thread.c are at:
> 
> http://www.pvfs.org/fisheye/browse/PVFS/src/io/trove/trove-dbpf/dbpf- 
> thread.c?r1=1.36&r2=1.37
> 
> I defined DPBF_ERROR_UKNOWN to 4243.
> 
> -sam
> 
>>
>> I apologize that I don't have an exact stack dump to paste in the  
>> email, but if we need any further information from the core file I  
>> think I can still get it loaded up on another machine to look at.
>>
>> Oh, and one other detail; the memory usage of the servers looks  fine 
>> during startup, so this doesn't appear to be a memory leak.   There is 
>> quite a bit of CPU work, but I am guessing that is just  berkeley db 
>> keeping busy in the iteration function.
>>
>> thanks,
>> -Phil
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
> 



More information about the Pvfs2-developers mailing list