[Pvfs2-developers] ncache causes shared creat problems

Phil Carns pcarns at wastedcycles.org
Mon Aug 28 14:37:16 EDT 2006


I think so.  When one node deletes a file, it does not send out messages 
to invalidate the cache in all of the other clients, so those still have 
a cached (no longer valid) entry.

If those other clients then lookup the file it will succeed (as if 
another client had won the race to create it), but when they try to 
access the file there will be an error because the handle in the cache 
is stale.

There really isn't much way around this with the local ncache approach. 
  Maybe the stock release should have the ncache disabled if this 
workload will be common (having one client delete a particular file and 
then a different client immediately recreate a file with the same name), 
or maybe at least disable it by default for system interface usage since 
MPI programs are probably more likely to trigger this than VFS programs.

-Phil

Robert Latham wrote:
> On Mon, Aug 28, 2006 at 04:28:32PM -0400, Pete Wyckoff wrote:
> 
>>So yeah, the file gets deleted by just one task.  Then they all
>>simultaneously try to create it again.
> 
> 
> That's also what happens when noncontig_coll2 was failing.  We did ok
> until a different process tried to open the file that another process
> just deleted.  By turning the ncache timeout way down (not disabled, but
> set to a very short interval), the test would pass.  Guess the delete
> from one process wasn't visible (is that the right word?) to other
> processes.
> 
> ==rob
> 



More information about the Pvfs2-developers mailing list