[PVFS-developers] Relocatable Metadata

Porter Don PorterDE@mercury.hendrix.edu
Wed, 24 Sep 2003 17:40:35 -0500


 Thanks Rob!

I was planning on more of a master/slave architecture where only one would
be inactive use unless the other went down.  I am still working on the
synchronization mechanism, but I wanted to remove this hurdle before I
bothered synchronizing non-portable data.

So, does it matter if the fs inode (the inode of the .pvfsdir file) is the
same as a file?  It seems that it is only used within the manager to tell if
a fs is mounted so far.

What about directory inodes?  It doesn't seem that they are really used for
much either, but perhaps the kernel module does use them and might get
confused if there was a collision.  Directories would not be hard to add to
this mechanism.  

Thanks for the feedback,
Don

-----Original Message-----
From: Rob Ross
To: Porter Don
Cc: 'pvfs-developers@www.beowulf-underground.org'
Sent: 9/24/03 4:18 PM
Subject: Re: [PVFS-developers] Relocatable Metadata

On Wed, 24 Sep 2003, Porter Don wrote:

> I have been looking at setting up redundancy and/or failover for pvfs
nodes.
> Obviously the use of the local filesystem's inode numbers prevent
direct
> copying of metadata files to another machine, lest there be new file
with
> the same inode an old one had.

Looking at the bigger picture, how are you planning on keeping these two

copies synchronized?

> I was wondering what reasons went into the decision to use fs inodes
as the
> iod indices other than easy management of the used/free indices?  

I was looking for an easy way to get unique values.  The decision was
made 
a very long time ago, well before I thought that this stuff was going to

actually be *used* anywhere :(.

> I have been experimenting with writing some code to dole out indices
when
> files are created and keep a table of used/free indices.  Upon cursory
> inspection, this seems to work - allowing metadata (and the table) to
be
> copied to another machine.

Yes, that should work fine.

I've seen another solution proposed in a paper that involved using a
hash
of the file name (since we don't have links this is a 1 to 1 mapping),
but
they didn't seem to handle directory renames in any reasonable way.

I like your solution better.

> I suppose my question, then, is are there design considerations that I
am
> missing in this approach and how does everyone feel about such an
idea?

Again, I think it's fine.  Be very careful in the meta directory though;

that code is extremely fragile.

> Granted, it would slow the manager down some, but perhaps easier
> backup/redundancy might be worth the trade.

Sure.  You can preallocate a few of them anyway, then have a simple 
recovery scheme for figuring out if preallocated ones were used or not
at 
startup to handle failures.  No reason why it can't be reasonably fast.

> Furhter, the time for a few calculations is nothing compared to the
time
> it takes for data to travel on the network anyway.  Also, to represent
> the available inodes on a 73 GB ext2 disk (for instance), we are only
> talking about around 950k (~1%).

Sounds a-ok.  This is a pretty big change though, so I'll probably not 
jump on integrating it right away -- I'd like to see it in use a little 
more first.

Regards,

Rob