interface madness (was Re: [PVFS-developers] another vote)
Robert Ross
rross@mcs.anl.gov
Tue, 24 Oct 2000 16:41:29 -0500 (CDT)
On Tue, 24 Oct 2000, Walter B. Ligon III wrote:
> > Use the kernel module to test your code.
>
> Yeah, I'm not impressed. Still a pain. I'm all for making things clean,
> but let's not make life more of a pain for the programmer.
I'm not sure that it's at all clear what application programmers are doing
with PVFS. I get the impression that they are mostly either just using
the kernel interface or taking some app they have already written to use
UNIX I/O and going in and stuffing pvfs_ in front of all their calls.
In either of those cases, debugging was done with UNIX I/O or the kernel
module.
All this is moot though; we're going to keep the UNIX support in the
pvfs_xxx calls.
> I mean, in some sense you ARE proposing a different interface - you are
> proposing the pvfs_xxx interface. What I meant was if the pvfs_xxx interface
> is distinct from the standard unix interface, then a file descriptor returned
> from pvfs_open() will not work wth read(). If that's the case, then the
> shadow stuff ISN'T really needed.
I see your point, and you're correct that the shadow stuff wouldn't have
to exist in its present form. However, there would still have to be
something that maps between PVFS "fds" and the various UNIX fds and PVFS
structures. We would also need to continue to spit out plain old integers
to aid in porting (which is a previously stated goal). We could rewrite
this stuff, however, and it might turn out nicer. I would rather expend
the effort on something else though.
> > Some simplification can take place if/when we remove the wrappers;
> > specifically all the uses of syscall() can be removed, and some code to
> > map between kernel data structures (i.e. dirent, struct) and glibc data
> > structures can be removed (VERY CAREFULLY). That will help a lot in
> > simplifying, reducing problems between kernel and glibc versions, and
> > aiding portability (there are a number of people interested in seeing
> > Alpha Linux and Alpha Tru64 work, and I'd like to satisfy them).
>
> OK, I'm not sure exactly what you mean here. I see the biggest problem with
> stat() calls that actually fill in a structure. I guess the getdents() does
> also? So are we proposing to stay true to the exiting form of these structs
> or will we allow them to vary? If they are the same then where do we get to
> save code? I must admit I'm a little fuzzy on this. Been a while since I
> saw that code.
Ok, let me summarize the problem here. The structures returned from the
kernel as the result of a stat() (or related) call are not the same as the
ones defined in sys/stat.h. So we had to manually map back and forth
before, and the mapping has been different for different kernel/glibc
combinations. Now we can just use the glibc calls and we know what the
structure will look like in all cases.
With respect to getdents(), looking at the code again I think I knocked
that problem out a little while back by moving some of the work into the
manager. It was a similar problem.
> Another thing I thought of - one of the original reasons for the wrapper
> code was so we could use stdio on pvfs files. Getting rid of the wrappers
> will force the use of the kernel interface for that. Maybe that's not a
> big deal.
>
> Maybe we should implement a buffered I/O library specifically for the pvfslib.
Again, we should figure out if anyone even CARES about this, and if they
don't then we should spend our effort somewhere else. If you're doing
buffered I/O, the kernel interface might be adequate.
> > It's definitely possible, but here are some reasons I'd like to avoid it:
> > 1) it's slower to go through the kernel
>
> Hmmm. That shouldn't be much of an issue. I mean how often does one do
> unlink(), mkdir(), chmod(), etc. on files, and who cares if they are a little
> slow?
Well, I can think of times when this is done a lot (e.g. tar), but in
those cases I/O is going to go through the kernel anyway. So maybe that
wasn't the best point.
> > 2) this makes the library even MORE complex:
> > - does this file exist on this machine?
> > - if so, is it really a PVFS files?
> > - if not, what mgr might I want to contact?
>
> Well, not really. I mean, most of this stuff has to be in there anyway.
> I mean, suppose we treated pvfs_open() kindof like fopen(). It returns a
> file descriptor that is only useful with pvfs_read() and pvfs_write(), etc.
> Within pvfs_open() we call open() to open the file, and stash the unix file
> descriptor. Subsequent calls that are not data I/O are simply translated
> into coresponding calls to the normal library calls. Calls to pvfs_read()
> and pvfs_write() go directly to the IOD. We don't even have to implement
> calls that take a file name for an argument (like pvfs_unlink()) cause
> they already work via the kernel interface (just need a quick wrapper to
> add pvfs_ to the name of the func).
I'm not exactly sure what your point is. Yes, we could do all this. We
could simplify the interface too to get rid of redundant calls. There is
definitely room for improvement.
I don't want to create a dependency on the kernel stuff in the base PVFS
code. As it is, you don't HAVE to have the kernel module loaded or the
file system mounted to get things done. That is good. And if we have the
capability to do things directly from the library, then why ALSO have the
option of going through the kernel? That's just extra code to do the same
thing. Plus the whole point of this originally was to be able to do
everything from user space, and we still CAN. Let's not give that up.
As an aside, the client-side daemon uses the calls that take filenames to
perform the majority of operations, because that avoids the overhead
associated with opening the file. Those are the simplest calls in the
bunch, and all they are is a quick wrapper in effect.
So we're back to the issue of what calls do we really need to access the
file system, in a roundabout way, which was my point in the next quoted
section.
> > I forsee in the future of PVFS a better, cleaner interface to the servers.
> > With this interface will come a simpler library of calls which other
> > interfaces will build upon (e.g. UNIX interface, VFS interface, ROMIO).
> > Once there is a clear separation between the UNIX interface and the
> > calls that do PVFS I/O, it will be easier to handle all the special cases.
>
> Yeah, that's cool. But let's not screw the app programmer while we make the
> interfaces nice for the systems programmer. The cleanest, nicest interface
> is also the least common denominator, and that is not always the goal we should
> shoot for. I just want to explore some of the possibilities.
The point isn't to "screw the app programmer"; the point is to have
reasonable building blocks on which to build things instead of putting
ridiculous quantities of functionality into what is supposed to be an
interface to a remote file system. This switching between UNIX I/O and
PVFS I/O, all the file partitioning code, all this stuff is just overhead
that gets in the way of getting the highest performance in what I'm
guessing is the standard case: people accessing PVFS files without
partitioning through the native interface.
Sure, have partitioned file interfaces, UNIX I/O, MDBI, ROMIO, and all
that good stuff. Certainly we want to provide the right interfaces. But
let's build them out of pieces that make sense instead of ending up with
another monolithic library of calls. Our code will be easier to read and
use, and pieces you don't care about won't be in your way.
Rob
---
Rob Ross, Mathematics and Computer Science Division, Argonne National Lab