[Pvfs2-developers] Error with concurrent opens

Phil Carns carns at mcs.anl.gov
Fri Aug 1 10:25:19 EDT 2008


Doh- sorry to hear about the merge problem, but I am relieved to know we 
don't have another bug floating around on this path!  Thanks for the update.

-Phil

Bart Taylor wrote:
> I finally narrowed it down. It turns out we had a problem merging the 
> previous release, but it did not show up since we never got a chance to 
> test it. Sam added an op_release in namei.c to fix a kmem_cache leak, 
> and it sneaked in twice without warning. Taking that out fixed the problem.
> 
> Bart.
> 
> 
> 
> On Tue, Jul 29, 2008 at 8:03 AM, Phil Carns <carns at mcs.anl.gov 
> <mailto:carns at mcs.anl.gov>> wrote:
> 
>     I'm having a hard time thinking of anything specific that would have
>     impacted this.  You could maybe try to narrow it down some by taking
>     a diff of just the src/kernel/linux-2.6 directory and apply that to
>     a 2.7.1 tree to test and see if it is something specifically in the
>     kernel module code.
> 
>     -Phil
> 
>     Bart Taylor wrote:
> 
>         I ran the test the same way you mentioned - outside of the LTP
>         framework - and still had the problem. I have applied the patch
>         that fixed the rename06 test as well as the kernel buffer
>         overflow fix from a few days ago and still have the problem.
> 
>         I did a CVS export of head this morning and used the same
>         configure and build as last time. I ran the open file test
>         against a file system created from head and against a 271 file
>         system (with some recent patches) and both tests succeed, so it
>         seems like the fix is somewhere between the 271 release and
>         head, but I am not sure where. Do you have an idea where it
>         might be lurking?
> 
>         Bart.
> 
> 
> 
>         On Fri, Jul 25, 2008 at 7:16 AM, Phil Carns <carns at mcs.anl.gov
>         <mailto:carns at mcs.anl.gov> <mailto:carns at mcs.anl.gov
>         <mailto:carns at mcs.anl.gov>>> wrote:
> 
>            Phil Carns wrote:
> 
>                Bart Taylor wrote:
> 
>                    I am having a problem with an LTP test from the
>         20080630 set
>                    of LTP tests. The
>                    'openfile01' test does 10 threaded opens of 10 files.
>         It is
>                    attached in case you
>                    need a copy. The test completes successfully, but an 'ls'
>                    command immediately
>                    after that  hangs and cannot be killed. Eventually
>         the node
>                    hangs as well. Any
>                    command that touches the file system will trigger the
>         problem.
> 
>                    We also tried this with the 2.7.1 release tarball and see
>                    the same problem. A
>                    single node file system running RHEL4 and a 2.6.9-67
>         kernel.
>                    The client was on
>                    the same node.
> 
>                    Here is the configure line used:
> 
>                      ./configure --with-kernel=/lib/modules/`uname -r`/build
> 
>                    and how the client was started:
> 
>                      ./pvfs2-client -p ./pvfs2-client-core
> 
>                    The fs.conf file is attached.
> 
>                    The client debug mask was set to 'all', and
>                    /proc/sys/pvfs2/debug had a value of
>                    32767. But once the 'ls' command was issued, there
>         were no
>                    log messages.
> 
>                    Does anyone else see this error?
> 
>                    Bart.
> 
> 
>                Are you able to reproduce this running openfile by itself
>         after
>                a fresh boot?  It looks like openfile operates on a file
>         in the
>                current working directory, so I have been trying to run
>         it like
>                this:
> 
>                <mount pvfs2 on /mnt/pvfs2>
>                cd /mnt/pvfs2
>                ~/openfile -f10 -t10
>                ls -alh
> 
>                So far I haven't had any trouble with that particular
>                combination.  I'm running it on a centos4 box with a very
>                similar kernel.  The openfile tests looks fairly
>         innocent- with
>                those arguments each of 10 separate threads open the same
>         single
>                file 10 times (for a total of 100 file descriptors open
>         to the
>                same file) if I understand correctly.
> 
>                If I try to run a full LTP test, however, I do have other
>                problems.  In particular the rename06 test hangs.  I can
>         trigger
>                that one by itself as follows:
> 
>                export TMPDIR=/mnt/pvfs2
>                ~/rename06
> 
>                The same suite of tests runs fine on a 2.6.24 kernel and
>         a trunk
>                build of PVFS.  I'm not sure yet if the difference is between
>                pvfs versions or between kernel versions.
> 
> 
>            The rename06 test passes with pvfs trunk; I think that particular
>            problem has already been fixed.  I still haven't figured out why
>            openfile01 would be a problem, though.
> 
>            -Phil
> 
> 
> 
> 



More information about the Pvfs2-developers mailing list