[Pvfs2-users] RE: lustre and pvfs comparison
robl at mcs.anl.gov
Thu Dec 21 17:15:17 EST 2006
On Wed, Dec 20, 2006 at 09:22:21AM -0600, Pappas, Bill wrote:
> I'm looking for some feedback from luster and pvfs users.
I asked Bill to raise this question on the PVFS list, so I'd better
respond to him :> Since this is "home field" so to speak, I won't
feel bad speaking positivley about PVFS, but I'll try to be objective
with respect to Lustre's strenghts and weaknesses. I read their
mailing lists, and I'm sure they read ours.
> Specifically ---I'm interested in any thoughts on why one would go
> to luster or pvfs for their hpc file system needs.
Here are a few PVFS strengths:
- (mostly) userspace design makes porting, supporting, installation,
and development much easier. The small kernel module we have lets
serial applications access PVFS convienently.
- Tightly integrated driver in ROMIO (a widely deployed MPI-IO
- Native support for noncontiguous data patterns (similar to MPI
datatypes), which are common in scientific applications.
> What fundamentally makes pvfs different from lustre?
That's a hard question to answer without sounding either like a raving
Lustre hater or a zealous PVFS fan, but I'll give it a shot: Lustre
appears to be designed first and foremost to be a POSIX file system
which could also handle parallel I/O. PVFS was designed first and
foremost to be a fast, scalable filesystem for parallel I/O, which can
also handle serial I/O workloads. It's important to stress there are
workloads that are an excelent fit for Lustre, just as other workloads
excel on PVFS.
It seems like many people find the PVFS development and user community
fairly open. We don't require assignment of copyright to pepole
contributing patches. We have one CVS tree, and while it might have a
lot of active branches at a given time, there is no "commercial"
version of PVFS hidden from interested parties. We do our best to
cary out development questions on public mailing lists. Several of us
hang out on IRC (#pvfs2 on irc.freenode.net) and answer questions when
people drop in.
> I realize that one may claim (that for specific requirements) luster
> or pvfs may be more suitable or just plain better. So....I'd like
> to know which requirement(s) led you to luster or pvfs?
If your typical application needs to scale to thousands of clients, or
you have applications that make use of MPI-IO (or higher level
libraries built on top of MPI-IO like parallel HDF5 or
Parallel-NetCDF), PVFS would be an excelent choice.
If your typical applicaiton is serial in nature, or you require strict
POSIX semantics, Lustre would be a good choice.
> I would definitely like to know any limitations you've seen in either
> fs. Installation complications? Scalabilty. Reliability. Speed.
I have not set up Lustre myself, but reports from many who have
suggest it somewhat more involved than setting up PVFS. We require no
kernel patches for PVFS: we've taken great pains to compile our
kernel module standalone against many different kernel.org and vendor
kernels. As already pointed out in this thread, PVFS is pretty
portable to many different architectures. You can set up a test
installation of PVFS on top of any directory you like: no need for a
dedicated device (though it wouldn't hurt performanc of course).
We designed PVFS from the beginning to scale well. We've run on IBM
Watson's 16k node bluegene system, and plan on deploying PVFS on some
of the large argonne systems we've got in the works.
PVFS reliability seems to be pretty good in practice. Sure, we
definitely get bug reports, which we act on as quickly as possible
(again, being mostly userspace, we can debug a lot of issues quickly).
Hardware failure is responsible for a lot of the outages we see at
sites like Argonne.
Speed is decent and getting better. Earlier this year we completed an
examination of metadata performance and came up with some new
approaches to speed that up. We're currently working on optimizing
our I/O rates (they aren't horrible now, but can be improved).
So, that's the high-level discussion of PVFS. Would it make sense in
your environment? Well, the way I see it, it doesn't cost anything to
download and install PVFS, and the time commmitment in setting it up
isn't that high either. If you have hardware available and you can
set up a test system, that might give you the best idea if PVFS is
right for you.
If you have any questions or would like to hear more about any of the
points above, feel free to ask.
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B
More information about the Pvfs2-users