[PVFS2-developers] last email
neillm at mcs.anl.gov
neillm at mcs.anl.gov
Thu Mar 11 14:18:08 EST 2004
On Thu, Mar 11, 2004 at 10:04:52AM -0600, neillm at mcs.anl.gov wrote:
> On Thu, Mar 11, 2004 at 09:58:25AM -0600, Rob Ross wrote:
> > We should be able to reproduce the behavior by overwriting executables
> > between runs on one client w/out going through the VFS on that client (so
> > that the client has no way of knowing locally that the file has changed).
>
> Ok, now I think I understand. I'll test this exact case to be sure at
> some point (by running a locally copied 'ls' on node 1, overwriting
> that binary with 'df' or something on node 2, and running the 'ls'
> again on node 1), but I'm almost certain this is not a problem in
> pvfs2 (unless I've made a big mistake and convinced myself otherwise).
Ok, to follow up on this, I looked at this for a second and found good
and bad news (that I'll end with what I consider to be better news).
- From the VFS point of view, we're fine and are doing the right
thing. We do not have stale data in the kernels page cache between
mmaps or executions.
- BUT... This doesn't exactly work 'as is' in current pvfs2-0.1.1.
Why? It's not the VFS, it's a bug in the mmap-readahead code (I
*knew* there was something fishy about it!). While the kernel level
page cache doesn't keep bunk data around, we're not flushing the
user space cache at the right time, so the first time the externally
swapped binary is run it fails, but subsequent runs are fine. (We
currently have a slightly deferred mmap-readahead flush - that's why
it works the second time but not the first).
================
pvfs2test1:~# cp `which ls` /mnt/pvfs2/ls
pvfs2test1:~# /mnt/pvfs2/ls -l
total 0
================
... externally replace the 'ls' program with 'df' from a different
pvfs2 node here ...
================
pvfs2test1:~# /mnt/pvfs2/ls -l
Segmentation fault
pvfs2test1:~# /mnt/pvfs2/ls -l
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/ide/host0/bus0/target0/lun0/part2
4806936 1990124 2572624 44% /
/dev/md/0 461524488 47612 438032732 1% /shared
tmpfs 257648 0 257648 0% /dev/shm
pvfs2 1314145992 99780 1314046212 1% /mnt/pvfs2
pvfs2test1:~#
================
- NOTE: To verify it's definitely the mmap-readahead cache code, I've
disabled the mmap-readahead cache entirely and it works as expected
(i.e. a transparent replacement with no segfault at first)
The better news: it's easy to fix the mmap-readahead code in this
case. We just need to tell it explicitly to flush the data for that
file when we flush the file's page cache data. (I'll be fixing this
shortly between a few other juggles).
Thanks for bringing up this problem though! I like to say with
confidence that we support certain things. This is a good test for
the mmap-readahead code as well, and of course the less bugs anywhere,
the better! ;-)
-Neill.
More information about the PVFS2-developers
mailing list