[Pvfs2-developers] server crash on startup with millions of files
Phil Carns
pcarns at wastedcycles.org
Tue Feb 20 11:58:26 EST 2007
Hi Sam,
Thanks for the suggestions and for the insight on the error codes.
I think tomorrow I'll try to replicate the problem we saw in a simpler
single server environment (the file system we saw this on is busy now).
That might make it easier to step through your suggestions, starting
with just upgrading to a newer version. I didn't realize that those
error code changes might have an impact here.
-Phil
Sam Lang wrote:
>
> On Feb 20, 2007, at 6:29 AM, Phil Carns wrote:
>
>> Hi guys,
>>
>> We have run into a problem recently with a configuration that looks
>> like this:
>>
>> - x86_64 architecture
>> - 16 servers
>> - SAN based storage
>> - approximately 1.4 million files on PVFS
>>
>> Everything works fine, except when we stop and then later restart one
>> of the pvfs2-server daemons. At least one of them usually (but not
>> quite always) crashes before the file system is ready to be mounted.
>>
>> We captured a core file and can see that it died on this assertion in
>> the dbpf_dspace_test() function:
>>
>> dbpf-dspace.c:1371
>> assert(!dbpf_op_queue_empty(dbpf_completion_queue_array[context_id]));
>>
>> According to the stack trace, this test() call followed a
>> trove_dspace_iterate_handles() call within the
>> trove_check_handle_ranges() function. This is part of the logic on
>> startup that scans all of the handles in the storage space to update
>> the list of available/used handles in trove-handle-mgmt.
>>
>> We found that we can completely work around the problem by manually
>> setting the coll_p->immediate_completion flag during the
>> trove_check_handle_ranges() function. That forces the
>> iterate_handles() function to do all of its processing up front
>> without using a test function. There is just some sort of bad
>> interaction when the two functions are used together.
>>
>> As a side note, setting the "ImmediateCompletion" config file option
>> does not work around the problem, because that flag does not take
>> effect until after this assertion occurs. The set_info calls in
>> pvfs2-server just happen to be in the wrong order. We would probably
>> not have used this approach anyway, because we haven't fully tested
>> the performance impact of enabling immediate completion for everything.
>>
>> Anyone have any suggestions about what the real problem is here?
>> While the workaround is fine to keep us running for now, it seems
>> like there is an underlying issue to be addressed.
>
>
> Hi Phil,
>
> It looks like the completion queue is empty but the state is set to
> OP_COMPLETED, which we assert shouldn't ever happen. In the dbpf
> thread function, we essentially add anything to the completion queue
> thats either DBPF_OP_COMPLETE (1) or an error (which we assume to be
> negative). We leave 0 (DBPF_OP_CONTINUE) and non-negative values for
> operations that need to be re-queued. There's a special case I've seen
> before though, where a DB call returns an error that the
> dbpf_db_error_to_trove_error function doesn't recognize as a DB error
> to translate and so returns -4243, but in the dspace code (including
> iterate_handles), we do:
>
> ret = -dbpf_db_error_to_trove_error(db_ret);
>
> so ret ends up being positive. I've tried to fix this in a recent
> version of the 2.6 branch and head, by checking that the error isn't
> -4243 or 4243 in the thread code, but I think for older versions the op
> gets added back to the queue or just ends up in la-la land.
>
> In any case, it _might_ help to upgrade to the latest HEAD or 2.6
> branch if possible. Also, you could test my theory by adding an
> assertion for anything that isn't DBPF_OP_COMPLETE in the dbpf-
> thread.c:dbpf_do_one_work_cycle function.
>
> If my theory is correct, then the next question is why db is returning
> an error that trove doesn't understand? Did you upgrade berkeley db?
> What's the actual error and why is iterate_handles causing it?
>
> If this isn't the problem, it would be helpful to know what the return
> value is from iterate_handles_op_svc.
>
> The changes I made to dbpf-thread.c are at:
>
> http://www.pvfs.org/fisheye/browse/PVFS/src/io/trove/trove-dbpf/dbpf-
> thread.c?r1=1.36&r2=1.37
>
> I defined DPBF_ERROR_UKNOWN to 4243.
>
> -sam
>
>>
>> I apologize that I don't have an exact stack dump to paste in the
>> email, but if we need any further information from the core file I
>> think I can still get it loaded up on another machine to look at.
>>
>> Oh, and one other detail; the memory usage of the servers looks fine
>> during startup, so this doesn't appear to be a memory leak. There is
>> quite a bit of CPU work, but I am guessing that is just berkeley db
>> keeping busy in the iteration function.
>>
>> thanks,
>> -Phil
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>
More information about the Pvfs2-developers
mailing list