[Pvfs2-developers] problems creating new trove-simple interface

Sam Lang slang at mcs.anl.gov
Tue Jun 27 16:08:03 EDT 2006


Hi John,

I think the best way (others can correct me) to modify the pvfs2 code  
to get the trove layer to operate on the logical offsets and sizes is  
in the flow code (flowproto_multiqueue.c).  My reasoning is that the  
flow layer converts the PVFS_Request structure into the physical  
offsets and sizes and passes them on to the trove layer.  The trove  
layer doesn't care whether the offsets and sizes passed to it (via  
trove_bstream_{read|write}_list) are logical or physical offsets, it  
just uses those values to operate directly on the bstream file in the  
normal case.  In your case the offsets and sizes passed to trove  
could be logical, and trove wouldn't know the difference.  In other  
words, you shouldn't have to modify any of the distribution code, or  
manipulate the offsets and sizes in the trove code, just use what the  
flow layer gives you.

The changes to the flow layer require that PINT_process_request  
return logical offsets instead of physical offsets (for both reads  
and writes).  It will do this if you pass PINT_CLIENT instead of  
PINT_SERVER as the mode (5th argument).  You will need to do this in  
each instance of PINT_process_request where PINT_SERVER is used.

PINT_process_request is a bit hard to use in some cases, but these  
changes are simple enough, and the offsets and sizes are treated  
opaquely everywhere outside of the function (except in AIO of  
course), which turns out to be a nice design of the framework in my  
view.

One caveat:  You will probably want to either turn off small IO,  
which doesn't use the flow layer, or make the same modifications to  
PINT_process_request in the small-io.sm.  You can just turn it off by  
compiling with CFLAGS=-DPVFS2_SMALL_IO_OFF.

-sam

On June 27, 2006, at 11:28AM, John Bent wrote:

>
> Ok, I've removed the footnote.  Now I'm doing everything within the  
> new
> trove layer and no longer doing it in the PINT_distribute although  
> I did
> change some things slightly.  The problem was that PINT_ADD_SEGMENT  
> was
> combining the segments assuming they were in their own individual  
> stripe.
> However, since they now must be interspersed with segments from other
> servers, they can no longer be combined.  (Obviously, this will  
> adversely
> affect performance of the old trove layer so it calls for some layer
> violation to only turn off merging depending on the trove layer  
> selected.
> I guess later if I care about this, I call add a trove function to the
> trove function table to this effect.)
>
> I'm still however unable to read the files back correctly using the  
> pvfs2
> servers.
>
> John
>
> On Tue, 27 Jun 2006, John Bent wrote:
>
>>
>> Hello,
>>
>> I'm working on a pet research project in which I'm (somewhat  
>> abashedly)
>> actually _removing_ functionality from PVFS2.  What I'm trying to  
>> do is
>> create a new trove interface in which requests to disk are no longer
>> logically striped across multiple PVFS2 servers each with its own
>> physical storage but are rather passed transparently from client  
>> through
>> PVFS2 onto a second and underlying shared file system on which  
>> each PVFS2
>> server is mounted.
>>
>> In order to do this, I have extended IO requests to pass the logical
>> filenames along with the handles and I have further modified
>> PINT_distribute(footnote 1) to use the file distribution info to  
>> translate
>> its physical offset into the actual logical file offset and then  
>> pass this
>> logical offset to the PINT_ADD_SEGMENT macro.
>>
>> This works in that files written to pvfs2 servers are transparently
>> created in the pvfs2 storage space.  These files can then be  
>> correctly
>> read directly from the other underlying shared file system.
>>
>> However, they can no longer be read correctly through the PVFS2  
>> servers.
>> Perhaps when I write to the actual logical offsets instead of to the
>> striped offsets, I am fooling the pvfs2 servers into thinking those
>> logical offsets are actually the striped ones?  When I try to read  
>> the
>> file back, I get a file that is N times the correct size where N  
>> is the
>> number of data servers.  What happens is that each server gives me  
>> each
>> segment of the file thinking that segment is unique to it.  (at  
>> least this
>> is what I think is happening)
>>
>> Does anyone have any suggestions where else I should look in the  
>> code to
>> modify this?
>>
>> Thanks,
>>
>> John
>>
>> footnote 1:  This is not very clean to do this in the PINT_distribute
>> function.  I did try to keep my changes isolated with the new  
>> trove layer
>> by passing the distribution info to the trove_bstream_[read|write] 
>> _list
>> functions but this had the same problem when I did the readback  
>> through
>> the PVFS2 servers as well as having the additional problem that the
>> readback directly through the other shared file system was _almost_
>> correct but somehow off by a little bit (seemingly at the end of the
>> file).
>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>



More information about the Pvfs2-developers mailing list