[PVFS2-developers] last email

Rob Ross rross at mcs.anl.gov
Thu Mar 11 14:33:15 EST 2004


Thanks for the summary, checking things out, and all that.  Very cool that 
the kernel stuff is doing the right thing!

This would probably be a good test to add to your test script, so we can
tell if somehow we mess this up at some point.  You could use /bin/true
and /bin/false as your binaries, which have very deterministic output :).  
Anyway, just a thought.


On Thu, 11 Mar 2004 neillm at mcs.anl.gov wrote:

> On Thu, Mar 11, 2004 at 10:04:52AM -0600, neillm at mcs.anl.gov wrote:
> > On Thu, Mar 11, 2004 at 09:58:25AM -0600, Rob Ross wrote:
> > > We should be able to reproduce the behavior by overwriting executables
> > > between runs on one client w/out going through the VFS on that client (so
> > > that the client has no way of knowing locally that the file has changed).
> > 
> > Ok, now I think I understand.  I'll test this exact case to be sure at
> > some point (by running a locally copied 'ls' on node 1, overwriting
> > that binary with 'df' or something on node 2, and running the 'ls'
> > again on node 1), but I'm almost certain this is not a problem in
> > pvfs2 (unless I've made a big mistake and convinced myself otherwise).
> Ok, to follow up on this, I looked at this for a second and found good
> and bad news (that I'll end with what I consider to be better news).
> - From the VFS point of view, we're fine and are doing the right
>   thing.  We do not have stale data in the kernels page cache between
>   mmaps or executions.
> - BUT...  This doesn't exactly work 'as is' in current pvfs2-0.1.1.
>   Why?  It's not the VFS, it's a bug in the mmap-readahead code (I
>   *knew* there was something fishy about it!).  While the kernel level
>   page cache doesn't keep bunk data around, we're not flushing the
>   user space cache at the right time, so the first time the externally
>   swapped binary is run it fails, but subsequent runs are fine.  (We
>   currently have a slightly deferred mmap-readahead flush - that's why
>   it works the second time but not the first).
> ================
> pvfs2test1:~# cp `which ls` /mnt/pvfs2/ls
> pvfs2test1:~# /mnt/pvfs2/ls -l
> total 0
> ================
> ... externally replace the 'ls' program with 'df' from a different
> pvfs2 node here ...
> ================
> pvfs2test1:~# /mnt/pvfs2/ls -l
> Segmentation fault
> pvfs2test1:~# /mnt/pvfs2/ls -l
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/ide/host0/bus0/target0/lun0/part2
>                        4806936   1990124   2572624  44% /
> /dev/md/0            461524488     47612 438032732   1% /shared
> tmpfs                   257648         0    257648   0% /dev/shm
> pvfs2                1314145992     99780 1314046212   1% /mnt/pvfs2
> pvfs2test1:~# 
> ================
> - NOTE: To verify it's definitely the mmap-readahead cache code, I've
>   disabled the mmap-readahead cache entirely and it works as expected
>   (i.e. a transparent replacement with no segfault at first)
> The better news: it's easy to fix the mmap-readahead code in this
> case.  We just need to tell it explicitly to flush the data for that
> file when we flush the file's page cache data.  (I'll be fixing this
> shortly between a few other juggles).
> Thanks for bringing up this problem though!  I like to say with
> confidence that we support certain things.  This is a good test for
> the mmap-readahead code as well, and of course the less bugs anywhere,
> the better!  ;-)
> -Neill.

More information about the PVFS2-developers mailing list