[PVFS2-developers] last email
Rob Ross
rross at mcs.anl.gov
Thu Mar 11 14:33:15 EST 2004
Hey,
Thanks for the summary, checking things out, and all that. Very cool that
the kernel stuff is doing the right thing!
This would probably be a good test to add to your test script, so we can
tell if somehow we mess this up at some point. You could use /bin/true
and /bin/false as your binaries, which have very deterministic output :).
Anyway, just a thought.
Rob
On Thu, 11 Mar 2004 neillm at mcs.anl.gov wrote:
> On Thu, Mar 11, 2004 at 10:04:52AM -0600, neillm at mcs.anl.gov wrote:
> > On Thu, Mar 11, 2004 at 09:58:25AM -0600, Rob Ross wrote:
> > > We should be able to reproduce the behavior by overwriting executables
> > > between runs on one client w/out going through the VFS on that client (so
> > > that the client has no way of knowing locally that the file has changed).
> >
> > Ok, now I think I understand. I'll test this exact case to be sure at
> > some point (by running a locally copied 'ls' on node 1, overwriting
> > that binary with 'df' or something on node 2, and running the 'ls'
> > again on node 1), but I'm almost certain this is not a problem in
> > pvfs2 (unless I've made a big mistake and convinced myself otherwise).
>
> Ok, to follow up on this, I looked at this for a second and found good
> and bad news (that I'll end with what I consider to be better news).
>
> - From the VFS point of view, we're fine and are doing the right
> thing. We do not have stale data in the kernels page cache between
> mmaps or executions.
>
> - BUT... This doesn't exactly work 'as is' in current pvfs2-0.1.1.
> Why? It's not the VFS, it's a bug in the mmap-readahead code (I
> *knew* there was something fishy about it!). While the kernel level
> page cache doesn't keep bunk data around, we're not flushing the
> user space cache at the right time, so the first time the externally
> swapped binary is run it fails, but subsequent runs are fine. (We
> currently have a slightly deferred mmap-readahead flush - that's why
> it works the second time but not the first).
>
> ================
> pvfs2test1:~# cp `which ls` /mnt/pvfs2/ls
> pvfs2test1:~# /mnt/pvfs2/ls -l
> total 0
> ================
>
> ... externally replace the 'ls' program with 'df' from a different
> pvfs2 node here ...
>
> ================
> pvfs2test1:~# /mnt/pvfs2/ls -l
> Segmentation fault
> pvfs2test1:~# /mnt/pvfs2/ls -l
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/ide/host0/bus0/target0/lun0/part2
> 4806936 1990124 2572624 44% /
> /dev/md/0 461524488 47612 438032732 1% /shared
> tmpfs 257648 0 257648 0% /dev/shm
> pvfs2 1314145992 99780 1314046212 1% /mnt/pvfs2
> pvfs2test1:~#
> ================
>
> - NOTE: To verify it's definitely the mmap-readahead cache code, I've
> disabled the mmap-readahead cache entirely and it works as expected
> (i.e. a transparent replacement with no segfault at first)
>
> The better news: it's easy to fix the mmap-readahead code in this
> case. We just need to tell it explicitly to flush the data for that
> file when we flush the file's page cache data. (I'll be fixing this
> shortly between a few other juggles).
>
> Thanks for bringing up this problem though! I like to say with
> confidence that we support certain things. This is a good test for
> the mmap-readahead code as well, and of course the less bugs anywhere,
> the better! ;-)
>
> -Neill.
>
>
More information about the PVFS2-developers
mailing list