[Pvfs2-developers] Server crash bug(s)
Nicholas Mills
nlmills at g.clemson.edu
Wed May 28 16:29:25 EDT 2008
Ok we narrowed it down to the lookup state machine. It seems like one
of the states was returning complete (1) after posting a job. As a
result the state machine was being freed while the job was still in
progress.
We changed the return value from SM_ACTION_COMPLETE to the return
value of the job and the server stopped crashing in all of our
previous test cases. A patch against HEAD is attached.
--Nick
On Wed, May 28, 2008 at 2:47 PM, David Bonnie <dbonnie at parl.clemson.edu> wrote:
> Hey all -
>
> Nick and I seem to have found a fairly hefty bug with the server crashing
> when copying to/from a directory. Obviously this could cause some serious
> problems if someone were to crash the server in the middle of writing
> files.
>
> Here's what we've got so far:
>
> Copying to a PVFS folder (using pvfs2-cp) from both local and pvfs2 share
> space:
> Permissions (of destination folder) / Result / Error
>
> 000 / Failure / server crashes on an assert(0)
> 100 / Success / NA
> 200 / Failure / server crashes with a "double free or corruption" error
> 300 / Success / NA
> 400 / Failure / server crashes on an assert(0)
> 500 / Success / NA
> 600 / Failure / server crashes on an assert(0)
> 700 / Success / NA
>
> For 400 and 600, the server debug log says the following:
> "SM current state or trtbl is invalid"
> "state-machine-fns.c:241 PINT_state_machine_next assertion(0)"
>
> As you can see, any write to a folder without execute permissions will
> crash the server.
>
>
> We checked the same things for reading from a PVFS folder (using pvfs2-cp):
> Permissions (of source folder) / Result / Error
>
> 000 / Failure / server crashes on an assert(0)
> 100 / Sucess / NA
> 200 / Failure / server crashes on the same assertion on line 241 as above
> 300 / Failure / server doesn't crash, but client will segfault
> 400 / Failure / server crashes on the same assertion on line 241 as above
> 500 / Success / NA
> 600 / Failure / server crashes on the same assertion on line 241 as above
> 700 / Success / NA
>
> pvfs2-ls -l completes as normal for any combination of permissions.
>
> It seems like one (or more) of the state machines are dumping out early
> and throwing the whole thing out of whack. We recreated the storage space
> between each run that failed to ensure that we weren't working with a
> corrupted filespace (since the server was aborting). Any ideas?
>
> This is happening with the code from HEAD on Red Hat Enterprise 5.
>
> - Dave
>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lookup.patch
Type: text/x-patch
Size: 484 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20080528/e0b7ff09/lookup.bin
More information about the Pvfs2-developers
mailing list