[PVFS2-CVS] commit by bradles in pvfs2/doc/design: distributions.tex

CVS commit program cvs at parl.clemson.edu
Thu May 20 15:23:24 EDT 2004


Update of /projects/cvsroot/pvfs2/doc/design
In directory styx.parl.clemson.edu:/tmp/cvs-serv3619/design

Modified Files:
	distributions.tex 
Log Message:
Updating the distributions design document to reflect current realities.

Added info about PVFS_sys_dist and add new methods for PINT_dist and
new initialization scheme.  Still does not mention how it fits into the
PINT_request stuff, probably because it doesn't need to.


Index: distributions.tex
===================================================================
RCS file: /projects/cvsroot/pvfs2/doc/design/distributions.tex,v
diff -p -u -r1.4 -r1.5
--- distributions.tex	4 Nov 2003 15:29:22 -0000	1.4
+++ distributions.tex	20 May 2004 18:23:24 -0000	1.5
@@ -1,7 +1,6 @@
 %
 % server design
 %
-
 \documentclass[11pt]{article} 
 \usepackage[dvips]{graphicx}
 \usepackage{times}
@@ -19,9 +18,9 @@
 \setlength{\parindent}{0pt}
 \setlength{\parskip}{11pt}
 
-\title{PVFS2 distributions design notes}
+\title{PVFS2 Distribution Design Notes}
 \author{PVFS Development Team}
-\date{February 2002}
+\date{May 2004}
 
 \begin{document}
 
@@ -40,7 +39,7 @@ methods.
 
 Files in PVFS appear as a linear sequence of bytes.  A specific byte
 in a file is identified by its offset from the start of this sequence.
-This is refered to here as a {\it logical offset}.  A contiguous
+This is refered to here as a \emph{logical offset}.  A contiguous
 sequence of bytes can be specified with a logical offset and an extent.
 
 Requests for access to file data can be to PVFS servers using various
@@ -50,26 +49,169 @@ formats must be decoded to produce a ser
 bytes each with a logical offest and extent.
 
 PVFS servers store some part of the logical byte sequence of each file
-in a a linear sequence of bytes or byte stream within a data space
+in a linear sequence of bytes or byte stream within a data space
 associated with the file.
 Bytes within this byte stream are identified by their offset from the
-start of the byte stream referred to here as a {\it physical offset}.
+start of the byte stream referred to here as a \emph{physical offset}.
 On the server the PVFS distribution methods are used to determine which
 portion of the requested data is stored on the server, and where in
 the associated byte stream the data is stored.
 
-The PVFS servers utilize the distribution methods to convert a logical
-offset and extent into one or more physical offsets and extents relative
-to the data space on the file server.  We next describe the methods used
-by the PVFS server and provide pseudo code for their use in decoding
-a request.
-
-\section{Methods}
+
+\section{System Interface Distributions}
+
+PVFS2 users should be able to utilize distributions effectively through
+the system interface.  API's are exposed that allow users to create files
+with the user-specified distribution.  In the case that no distribution is
+specified (i.e. the NULL distribution is specified), the default distribution,
+simple stripe is used.  The system interface must be initialized before 
+distributions may be accessed.
+
+The external distribution API is exposed to users via the following data types
+and functions:
+
+\begin{verbatim}
+  struct PVFS_sys_dist;
+\end{verbatim}
+
+The system interface distribution structure.  It contains the distribution
+identifier (i.e. the name) and a pointer to an instance of the distribution
+parameters for this type distribution.  In general, the user should not
+modify the data within this struct.
+
+\begin{verbatim}
+  int PVFS_sys_create( char* entry_name,
+                       PVFS_object_ref ref,
+                       PVFS_sys_attr,
+                       PVFS_credentials credentials,
+                       PVFS_sys_dist* dist,
+                       PVFS_sysresp_create* resp );
+\end{verbatim}
+
+Creates a file using the specified distribution.  If no distribution is
+specified, the default distribution \emph{simple\_stripe} is used during
+creation.  The distribution used during file creation is stored with the
+file and may not be changed later.  Altering the distribution used to 
+store the file contents could result in data corruption.
+
+\begin{verbatim}
+  PVFS_sys_dist* PVFS_sys_dist_lookup( const char* name );
+\end{verbatim}
+
+Allocates a new distribution instance by copying the internal distribution
+registered for the supplied name.  Note that the internal distribution has
+additional data not exposed thru the system interface, but that should be
+fully configurable thru the distribution parameters.
+
+\begin{verbatim}
+  int PVFS_sys_dist_free( PVFS_sys_dist* dist );
+\end{verbatim}
+
+Deallocate all system interface resources allocated during distribution
+lookup.
+
+\begin{verbatim}
+  int PVFS_sys_dist_setparam( PVFS_sys_dist* dist,
+                              const char* param,
+                              void* value );
+\end{verbatim}
+
+Set the distribution parameter specified by the string \emph{param} to
+\emph{value}.  The strings used to specify parameters are distribution defined
+but should generally correspond to the field name in the distributions 
+parameter struct.  All parameters must be set before the distribution is used
+in file creation.  Once a file is created, there is no safe way to modify
+the distribution parameters for that file.
+
+
+\section{Distribution Initialization}
+
+All distributions are registered during PVFS2 initialization.  Although there
+has been some discussion about having distributions function as loadable
+modules, there is currently no support for that feature within PVFS2.  All
+available distributions are loaded into a registration table during
+initialization and registered with the distribution name as the key.  When a
+user then wishes to create a distribution later, a lookup can be performed
+with the distribution name, and a copy of the registered distribution is
+returned.  The registered distribution itself is never actually modified after
+registration.  The only opportunity to modify the registered distribution is
+during the registration itself.  Each distribution implements a callback
+method named \emph{register\_init} that is called during registration.  The
+function signature is described completely below, for now we merely want to
+note that this function is called exactly once (at registration time), and
+it is generally used by distributions to setup the distribution parameter
+strings (for use in PVFS\_sys\_dist\_setparam), and to set default parameter
+values.
+
+Distribution initialization is performed by the function 
+PINT\_dist\_initialize() in pint-dist-utils.h.  In order to add a new
+distribution to the table of registered distributions, it will be neccesary to
+modify this function.
+
+
+\section{Internal Distribution Representation}
+
+PVFS2 distributions are internally represented with the struct PINT\_dist.
+This structure contains a pointer to the distribution name, methods,
+parameters and various sizes.  The internal distributions are used on both the
+clients and the metadata server, as well as being stored physically with the 
+file metadata.
+
+When a user creates a file, the system distribution supplied, or the default
+distribution is exchanged for a corresponding PINT\_dist structure.  It is this
+structure that will be used for any further operations performed on the file
+and stored in the metadata for the file.  
+
+The client and server both use the distribution methods to fulfill the request 
+from the client to the server to locate a specific byte range in a specific 
+file.  All this processing is performed within the PINT request for the file 
+and byte range. The main difference in the client and server processing is the
+way segments are built is different as they represent the distribution of data
+from the various servers, not the distribution of data on the server (What in
+the world does this sentence mean?!?)
+
+Distribution parameters are defined in the exported header for the
+distribution (e.g. for the simple stripe distribution, the header file is
+pvfs2-dist-simple-stripe.h).  The distribution methods are usually defined in
+a corresponding implementation file in the io/description subsystem (e.g. the
+simple stripe implementation is in io/description/dist-simple-stripe.c).
+
+The methods defined for each distribution allow it to completely specify how
+the file data is mapped to the PVFS2 disk abstraction, the data file object.
+The one possible exception to this is that distributions cannot currently
+assert their preference in how data file objects are mapped to data servers.
+This is planned in the near future, however their is no current consensus on
+how to improve upon the current round robin mapping approach (see
+PINT\_bucket\_get\_next\_io).
+
+\section{Distribution Parameters}
+
+The parameters for each distribution are defined in a struct defined
+specifically for the distribution, and an individual instance of the
+parameters is stored in the metadata of every file.  
+
+Both the PVFS\_sys\_dist and PINT\_dist data structures maintain a pointer to
+the same distribution parameters.  The parameters are passed into every call to
+distribution code so that distribution can modify its behavior as neccesary.
+The distribution provider can also provide a method for setting the
+distribution parameters explicitly as described in the distribution methods
+below. 
+
+\section{Distribution Methods}
+
+The distribution methods are the individual code used by each distribution to
+perform mappings between the logical file data and the data file objects.  The
+methods also provide a mechanism for encoding/decoding the distribution
+parameters, determining the number of data file objects to create for a file,
+modifying distribution parameters, and distribution registration tasks.  For
+some of the methods a default implementation is available that may be
+acceptable for most distributions.
 
 \begin{verbatim}
-   	PVFS_offset logical_to_physical_offset (PVFS_Dist_parm *dparm,
-        		uint32_t server_nr, uint32_t server_ct,
-         	PVFS_offset logical_offset);
+  PVFS_offset logical_to_physical_offset( void* params,
+                                          uint32_t dfile_nr, 
+                                          uint32_t dfile_ct,
+                                          PVFS_offset logical_offset );
 \end{verbatim}
 
 Given a logical offset, return the physical offset that corresponds to
@@ -78,9 +220,10 @@ down to the largest physical offset held
 logical offset does not map to a physical offset on that server.
 
 \begin{verbatim}
-   	PVFS_offset physical_to_logical_offset (PVFS_Dist_parm *dparm,
-         	uint32_t server_nr, uint32_t server_ct,
-         	PVFS_offset physical_offset);
+  PVFS_offset physical_to_logical_offset( void* params,
+                                          uint32_t dfile_nr, 
+                                          uint32_t dfile_ct,	
+                                          PVFS_offset physical_offset)
 \end{verbatim}
 
 Given a physical offset, return the logical offset that corresponds to
@@ -88,9 +231,10 @@ that physical offset.  Returns a logical
 assumed to be on the current PVFS server.
 
 \begin{verbatim}
-   	PVFS_offset next_mapped_offset (PVFS_Dist_parm *dparm,
-         	uint32_t server_nr, uint32_t server_ct,
-         	PVFS_offset logical_offset);
+  PVFS_offset next_mapped_offset( void* params,
+                                  uint32_t dfile_nr, 
+                                  uint32_t dfile_ct, 
+                                  PVFS_offset logical_offset)
 \end{verbatim}
 
 Given a logical offset, find the logical offset greater than or equal
@@ -98,126 +242,57 @@ to the logical offset that maps to a phy
 PVFS server.  Returns a logical offset.
 
 \begin{verbatim}
-   	PVFS_size contiguous_length (PVFS_Dist_parm *dparm,
-         	uint32_t server_nr, uint32_t server_ct,
-         	PVFS_offset physical_offset);
+  PVFS_size contiguous_length( void* params,
+                               uint32_t dfile_nr, 
+                               uint32_t dfile_ct, 
+                               PVFS_offset physical_offset)
 \end{verbatim}
 
 Beginning in a given physical location, return the number of contiguous
 bytes in the physical bytes stream on the current PVFS server that map
 to contiguous bytes in the logical byte sequence.  Returns a length in bytes.
 
-PVFS distribution processing pseudo code:
+\begin{verbatim}
+  int get_num_dfiles( void* params,
+                      uint32_t num_servers_requested, 
+                      uint32_t num_dfiles_requested )
+\end{verbatim}
+
+Returns the number of data file objects to use for the requested file.  The
+number of servers requested and number of data files requested are hints from
+the user that the distribution can ignore if neccesary.  A default
+implementation of this function is provided in pint-dist-utils.h that returns
+the number of servers requested (which is usually the number of data servers
+in the system).
 
 \begin{verbatim}
-	// INPUTS
-   PVFS_offset offset;      // logical offset of requested data
-   PVFS_size size;          // size of requested data
-   int req_type;            // type of read A_READ or A_WRITE
-   PVFS_Dist_parm *d_p;     // point to file distribution parameter structure
-	uint32_t server_nr;    // number of iods data distributed on
-	uint32_t server_ct;  // ordinal number this iod
-	PVFS_distribution *dist; // distribution methods
-
-	// LOCALS
-   PVFS_offset loff;
-   PVFS_offset diff;
-   PVFS_offset poff;
-   PVFS_size   sz;
-   PVFS_size   fraglen;
-
-   loff = (*dist->next_mapped_offset) (d_p, server_nr, server_ct, offset);
-   while ((diff = loff - offset) < size)
-   {
-      poff = (*dist->logical_to_physical_offset)(d_p,server_nr,server_ct,loff);
-      sz = size - diff;
-      if (poff+sz > m_p->fsize && req_type==A_READ) // check for append 
-      {
-         /* update the file size info */
-         if (update_fsize() < 0) return(-1);
-         if (poff+sz > m_p->fsize) sz = m_p->fsize - poff; // stop @ EOF
-         if (sz <= 0)
-         {
-            // hit end of file
-            return(1);
-         }
-      }
-      fraglen = (*dist->contiguous_length) (d_p, server_nr, server_ct, poff);
-      if (sz <= fraglen || m_p->pcount == 1) // all in 1 block
-      {
-         create_segment (poff, sz);
-         return(0);
-      }
-      else // frag extends beyond this stripe
-      {
-         create_segment (poff, fraglen);
-      }
-      /* prepare for next iteration */
-      loff  += fraglen;
-      size  -= loff - offset;
-      offset = loff;
-      loff = (*dist->next_mapped_offset) (d_p, server_nr, server_ct, offset);
-   }
-\end{verbatim}
-
-\section{Client Processing}
-
-PVFS clients run the same code as a PVFS server, but the way segments
-are built is different as they represent the distribution of data from
-the various servers, not the distribution of data on the server.
-
-\section{Distribution Registration}
-
-Distributions are registerd with PVFS byt either compiling a
-distribution method entry into the distribution table of the PVFS code
-or by dynamically adding a method entry to the table.   Distribution
-method entries are registration functions are defined as follows:
-
-\begin{verbatim}
-	struct PVFS_Distribution {
-   	char *dist_name;
-   	int param_size;
-   	PVFS_offset (*logical_to_physical_offset) (PVFS_Dist_parm *dparm,
-        		uint32_t server_nr, uint32_t server_ct,
-         	PVFS_offset logical_offset);
-   	PVFS_offset (*physical_to_logical_offset) (PVFS_Dist_parm *dparm,
-         	uint32_t server_nr, uint32_t server_ct,
-         	PVFS_offset physical_offset);
-   	PVFS_offset (*next_mapped_offset) (PVFS_Dist_parm *dparm,
-         	uint32_t server_nr, uint32_t server_ct,
-         	PVFS_offset logical_offset);
-   	PVFS_size (*contiguous_length) (PVFS_Dist_parm *dparm,
-         	uint32_t server_nr, uint32_t server_ct,
-         	PVFS_offset physical_offset);
-	};
-
-   void PVFS_register_distribution(struct PVFS_distribution *d_p);
-
-   void PVFS_unregister_distribution(char *dist_name);
-\end{verbatim}
-
-Dynamically loaded modules are expected to provide initialization and
-cleanup functions as follows:
-
-\begin{verbatim}
-   void init_module();
-
-   void cleanup_module();
-\end{verbatim}
-
-The init\_module function would generally register the distribution and
-the cleanup\_module function would generally unregister the
-distribution.
+  int set_param( const char* dist_name, void* params
+                 const char* param_name, void* value )
+\end{verbatim}
 
-\section{Distribution Parameters}
+Set the distribution parameter described by \emph{param\_name} to
+\emph{value}.  A default implementation is provided in pint-dist-utils.h that
+can handle parameters that have been previously registered.
+
+\begin{verbatim}
+  void encode_lebf( char** pptr, void* params )
+\end{verbatim}
+
+Write \emph{params} into the data stream pptr in little endian byte format.
+
+\begin{verbatim}
+  void decode_lebf( char** pptr, void* params )
+\end{verbatim}
+
+Read \emph{params} from the data stream pptr in little endian byte format.
+
+\begin{verbatim}
+  void registration_init( void* params )
+\end{verbatim}
+
+Called when the distribution is registered (i.e. once).  Used to set default
+distribution values, register parameters, or any other initialization activity
+needed by the distribution.
 
-Distributions may define a structure containing parameters for the
-distribution which are assigned on a per-file basis, stored with the
-file metadata, and provided to each method when it is called.  Default
-parameters are provided with the methods and are used if a NULL pointer
-to distribution parameters is passed into the method.  The definition of
-the parameter structures should be provided to user programs via an
-include file, where the parameters can be initialized and passed in to
-the system through an interface routine.
 
-\end{document}
+\end{document}
\ No newline at end of file



More information about the PVFS2-CVS mailing list