[PVFS2-developers] last email

neillm at mcs.anl.gov neillm at mcs.anl.gov
Thu Mar 11 14:18:08 EST 2004


On Thu, Mar 11, 2004 at 10:04:52AM -0600, neillm at mcs.anl.gov wrote:
> On Thu, Mar 11, 2004 at 09:58:25AM -0600, Rob Ross wrote:
> > We should be able to reproduce the behavior by overwriting executables
> > between runs on one client w/out going through the VFS on that client (so
> > that the client has no way of knowing locally that the file has changed).
> 
> Ok, now I think I understand.  I'll test this exact case to be sure at
> some point (by running a locally copied 'ls' on node 1, overwriting
> that binary with 'df' or something on node 2, and running the 'ls'
> again on node 1), but I'm almost certain this is not a problem in
> pvfs2 (unless I've made a big mistake and convinced myself otherwise).

Ok, to follow up on this, I looked at this for a second and found good
and bad news (that I'll end with what I consider to be better news).

- From the VFS point of view, we're fine and are doing the right
  thing.  We do not have stale data in the kernels page cache between
  mmaps or executions.

- BUT...  This doesn't exactly work 'as is' in current pvfs2-0.1.1.
  Why?  It's not the VFS, it's a bug in the mmap-readahead code (I
  *knew* there was something fishy about it!).  While the kernel level
  page cache doesn't keep bunk data around, we're not flushing the
  user space cache at the right time, so the first time the externally
  swapped binary is run it fails, but subsequent runs are fine.  (We
  currently have a slightly deferred mmap-readahead flush - that's why
  it works the second time but not the first).

================
pvfs2test1:~# cp `which ls` /mnt/pvfs2/ls
pvfs2test1:~# /mnt/pvfs2/ls -l
total 0
================

... externally replace the 'ls' program with 'df' from a different
pvfs2 node here ...

================
pvfs2test1:~# /mnt/pvfs2/ls -l
Segmentation fault
pvfs2test1:~# /mnt/pvfs2/ls -l
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/ide/host0/bus0/target0/lun0/part2
                       4806936   1990124   2572624  44% /
/dev/md/0            461524488     47612 438032732   1% /shared
tmpfs                   257648         0    257648   0% /dev/shm
pvfs2                1314145992     99780 1314046212   1% /mnt/pvfs2
pvfs2test1:~# 
================

- NOTE: To verify it's definitely the mmap-readahead cache code, I've
  disabled the mmap-readahead cache entirely and it works as expected
  (i.e. a transparent replacement with no segfault at first)

The better news: it's easy to fix the mmap-readahead code in this
case.  We just need to tell it explicitly to flush the data for that
file when we flush the file's page cache data.  (I'll be fixing this
shortly between a few other juggles).

Thanks for bringing up this problem though!  I like to say with
confidence that we support certain things.  This is a good test for
the mmap-readahead code as well, and of course the less bugs anywhere,
the better!  ;-)

-Neill.


More information about the PVFS2-developers mailing list