[Pvfs2-developers] problems creating new trove-simple interface
Walter B. Ligon III
walt at clemson.edu
Wed Jun 28 13:12:45 EDT 2006
John, I'm not sure I completely understand what you are doing, but it
SOUNDS to me like what you need is to write a distribution for your
situation. You really shouldn't have to change Trove, as all it does is
read or write the segments as dictated by the distribution. Writing a
distribution should be fairly simple and once you have done it you
should be able to track updates to PVFS easily - where as modifying the
base code makes that pretty difficult.
Walt
John Bent wrote:
> Thanks Sam,
>
> Sadly, however, your suggestion did not just work directly out of the box.
> The servers are not at the correct logical offsets and are overwriting
> each other. Additionally, your approach suffers the same problem that
> mine did which is that each datafile (which is actually now the same
> shared datafile) is read completely by the client from each server.
> Therefore when reading a file, the client actually creates a new file
> that is comprised of N copies of the file where N is the number of
> datafiles.
>
> Perhaps the cleanest solution would be to create a new distribution and
> pass this to the client. This distribution would simply instruct the
> client to use the same physical offsets as logical. This should then work
> equally well for reads and writes. I see in the io/description directory
> that there are already distributions for simple-stripe, basic, and
> varstrip. Is this a workable, good approach?
>
> John
>
>
> On Tue, 27 Jun 2006, Sam Lang wrote:
>
>
>>Hi John,
>>
>>I think the best way (others can correct me) to modify the pvfs2 code
>>to get the trove layer to operate on the logical offsets and sizes is
>>in the flow code (flowproto_multiqueue.c). My reasoning is that the
>>flow layer converts the PVFS_Request structure into the physical
>>offsets and sizes and passes them on to the trove layer. The trove
>>layer doesn't care whether the offsets and sizes passed to it (via
>>trove_bstream_{read|write}_list) are logical or physical offsets, it
>>just uses those values to operate directly on the bstream file in the
>>normal case. In your case the offsets and sizes passed to trove
>>could be logical, and trove wouldn't know the difference. In other
>>words, you shouldn't have to modify any of the distribution code, or
>>manipulate the offsets and sizes in the trove code, just use what the
>>flow layer gives you.
>>
>>The changes to the flow layer require that PINT_process_request
>>return logical offsets instead of physical offsets (for both reads
>>and writes). It will do this if you pass PINT_CLIENT instead of
>>PINT_SERVER as the mode (5th argument). You will need to do this in
>>each instance of PINT_process_request where PINT_SERVER is used.
>>
>>PINT_process_request is a bit hard to use in some cases, but these
>>changes are simple enough, and the offsets and sizes are treated
>>opaquely everywhere outside of the function (except in AIO of
>>course), which turns out to be a nice design of the framework in my
>>view.
>>
>>One caveat: You will probably want to either turn off small IO,
>>which doesn't use the flow layer, or make the same modifications to
>>PINT_process_request in the small-io.sm. You can just turn it off by
>>compiling with CFLAGS=-DPVFS2_SMALL_IO_OFF.
>>
>>-sam
>>
>>On June 27, 2006, at 11:28AM, John Bent wrote:
>>
>>
>>>Ok, I've removed the footnote. Now I'm doing everything within the
>>>new
>>>trove layer and no longer doing it in the PINT_distribute although
>>>I did
>>>change some things slightly. The problem was that PINT_ADD_SEGMENT
>>>was
>>>combining the segments assuming they were in their own individual
>>>stripe.
>>>However, since they now must be interspersed with segments from other
>>>servers, they can no longer be combined. (Obviously, this will
>>>adversely
>>>affect performance of the old trove layer so it calls for some layer
>>>violation to only turn off merging depending on the trove layer
>>>selected.
>>>I guess later if I care about this, I call add a trove function to the
>>>trove function table to this effect.)
>>>
>>>I'm still however unable to read the files back correctly using the
>>>pvfs2
>>>servers.
>>>
>>>John
>>>
>>>On Tue, 27 Jun 2006, John Bent wrote:
>>>
>>>
>>>>Hello,
>>>>
>>>>I'm working on a pet research project in which I'm (somewhat
>>>>abashedly)
>>>>actually _removing_ functionality from PVFS2. What I'm trying to
>>>>do is
>>>>create a new trove interface in which requests to disk are no longer
>>>>logically striped across multiple PVFS2 servers each with its own
>>>>physical storage but are rather passed transparently from client
>>>>through
>>>>PVFS2 onto a second and underlying shared file system on which
>>>>each PVFS2
>>>>server is mounted.
>>>>
>>>>In order to do this, I have extended IO requests to pass the logical
>>>>filenames along with the handles and I have further modified
>>>>PINT_distribute(footnote 1) to use the file distribution info to
>>>>translate
>>>>its physical offset into the actual logical file offset and then
>>>>pass this
>>>>logical offset to the PINT_ADD_SEGMENT macro.
>>>>
>>>>This works in that files written to pvfs2 servers are transparently
>>>>created in the pvfs2 storage space. These files can then be
>>>>correctly
>>>>read directly from the other underlying shared file system.
>>>>
>>>>However, they can no longer be read correctly through the PVFS2
>>>>servers.
>>>>Perhaps when I write to the actual logical offsets instead of to the
>>>>striped offsets, I am fooling the pvfs2 servers into thinking those
>>>>logical offsets are actually the striped ones? When I try to read
>>>>the
>>>>file back, I get a file that is N times the correct size where N
>>>>is the
>>>>number of data servers. What happens is that each server gives me
>>>>each
>>>>segment of the file thinking that segment is unique to it. (at
>>>>least this
>>>>is what I think is happening)
>>>>
>>>>Does anyone have any suggestions where else I should look in the
>>>>code to
>>>>modify this?
>>>>
>>>>Thanks,
>>>>
>>>>John
>>>>
>>>>footnote 1: This is not very clean to do this in the PINT_distribute
>>>>function. I did try to keep my changes isolated with the new
>>>>trove layer
>>>>by passing the distribution info to the trove_bstream_[read|write]
>>>>_list
>>>>functions but this had the same problem when I did the readback
>>>>through
>>>>the PVFS2 servers as well as having the additional problem that the
>>>>readback directly through the other shared file system was _almost_
>>>>correct but somehow off by a little bit (seemingly at the end of the
>>>>file).
>>>>
>>>>_______________________________________________
>>>>Pvfs2-developers mailing list
>>>>Pvfs2-developers at beowulf-underground.org
>>>>http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>>
>>>
>>>_______________________________________________
>>>Pvfs2-developers mailing list
>>>Pvfs2-developers at beowulf-underground.org
>>>http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>
>>
>>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
More information about the Pvfs2-developers
mailing list