[Pvfs2-developers] Server crash bug(s)
Sam Lang
slang at mcs.anl.gov
Thu May 29 11:49:45 EDT 2008
Great find guys. It looks like this was introduced with the SM
changes a while back -- maybe no one removes the execute bit from
their directories or we hopefully would have seen this sooner?
Another motivating instance for getting a good unit testing framework
and code coverage analysis setup.
Can you commit the fix to head?
Thanks,
-sam
On May 28, 2008, at 3:29 PM, Nicholas Mills wrote:
> Ok we narrowed it down to the lookup state machine. It seems like one
> of the states was returning complete (1) after posting a job. As a
> result the state machine was being freed while the job was still in
> progress.
>
> We changed the return value from SM_ACTION_COMPLETE to the return
> value of the job and the server stopped crashing in all of our
> previous test cases. A patch against HEAD is attached.
>
> --Nick
>
> On Wed, May 28, 2008 at 2:47 PM, David Bonnie <dbonnie at parl.clemson.edu
> > wrote:
>> Hey all -
>>
>> Nick and I seem to have found a fairly hefty bug with the server
>> crashing
>> when copying to/from a directory. Obviously this could cause some
>> serious
>> problems if someone were to crash the server in the middle of writing
>> files.
>>
>> Here's what we've got so far:
>>
>> Copying to a PVFS folder (using pvfs2-cp) from both local and pvfs2
>> share
>> space:
>> Permissions (of destination folder) / Result / Error
>>
>> 000 / Failure / server crashes on an assert(0)
>> 100 / Success / NA
>> 200 / Failure / server crashes with a "double free or corruption"
>> error
>> 300 / Success / NA
>> 400 / Failure / server crashes on an assert(0)
>> 500 / Success / NA
>> 600 / Failure / server crashes on an assert(0)
>> 700 / Success / NA
>>
>> For 400 and 600, the server debug log says the following:
>> "SM current state or trtbl is invalid"
>> "state-machine-fns.c:241 PINT_state_machine_next assertion(0)"
>>
>> As you can see, any write to a folder without execute permissions
>> will
>> crash the server.
>>
>>
>> We checked the same things for reading from a PVFS folder (using
>> pvfs2-cp):
>> Permissions (of source folder) / Result / Error
>>
>> 000 / Failure / server crashes on an assert(0)
>> 100 / Sucess / NA
>> 200 / Failure / server crashes on the same assertion on line 241 as
>> above
>> 300 / Failure / server doesn't crash, but client will segfault
>> 400 / Failure / server crashes on the same assertion on line 241 as
>> above
>> 500 / Success / NA
>> 600 / Failure / server crashes on the same assertion on line 241 as
>> above
>> 700 / Success / NA
>>
>> pvfs2-ls -l completes as normal for any combination of permissions.
>>
>> It seems like one (or more) of the state machines are dumping out
>> early
>> and throwing the whole thing out of whack. We recreated the
>> storage space
>> between each run that failed to ensure that we weren't working with a
>> corrupted filespace (since the server was aborting). Any ideas?
>>
>> This is happening with the code from HEAD on Red Hat Enterprise 5.
>>
>> - Dave
>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
> <lookup.patch>_______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
More information about the Pvfs2-developers
mailing list