# [Pvfs2-cvs] commit by pcarns in pvfs2-1/doc: pvfs2-tuning.tex

CVS commit program cvs at parl.clemson.edu
Thu Aug 7 12:30:12 EDT 2008

Update of /projects/cvsroot/pvfs2-1/doc
In directory parlweb1:/tmp/cvs-serv6370/doc

Modified Files:
Tag: small-file-branch
pvfs2-tuning.tex
Log Message:
merge trunk updates down to small-file-branch.  Passes basic tests but needs
some double checking of pint-cached-config and sys-create conflicts.

Index: pvfs2-tuning.tex
===================================================================
RCS file: /projects/cvsroot/pvfs2-1/doc/pvfs2-tuning.tex,v
diff -p -u -r1.1 -r1.1.22.1
--- pvfs2-tuning.tex	11 Oct 2006 17:02:50 -0000	1.1
+++ pvfs2-tuning.tex	7 Aug 2008 16:30:12 -0000	1.1.22.1
@@ -123,15 +123,251 @@ Sync or not, metadata, data

-\subsection{Extended Attributes}
+\section{Number of Datafiles}

-\subsubsection{Directory Hints}
+Each file stored on PVFS is broken into smaller parts to be
+distributed across servers.  The metadata (which includes
+information such as the owner and permissions of the file) is stored in
+a metadata file on one server.  The actual file contents are stored in
+\emph{datafiles} distributed among multiple servers.
+
+By default, each file in PVFS is made up of N datafiles, where N is
+the number of servers.  In most situations, this is the most efficient
+number of datafiles to use because it leverages all available resources
+evenly and allows the load to be distributed.  This is especially
+beneficial for large files accessed in parallel.
+
+However, there are also cases where it may be helpful to use a different
+number of datafiles.  For example, if you have a set of many small
+files (a few KB each) that are accessed in serial, then distributing
+them across all servers increases overhead without gaining any benefit
+from parallelism.  In this case it will most likely perform better if
+the number of datafiles is set to 1 for each file.
+
+The most straightforward way to change the number of datafiles is by
+setting extended attributes on a directory.  Any new files created in
+that directory will utilize the specified number of datafiles
+regardless of how many servers are available.  New subdirectories will
+inherit the same settings.  It is also possible to set a default number
+of datafiles in the configuration file for the entire file system, but
+
+PVFS does not allow the number of datafiles to be changed dynamically
+for existing files.  If you wish to convert an existing file, then you must
+copy it to a new file with the appropriate datafile setting and then delete
+the old file.
+
+Use this command to specify the number of datafiles to use within a given
+directory:
+
+\begin{verbatim}
+$setfattr -n user.pvfs2.num_dfiles -v 1 /mnt/pvfs2/dir +\end{verbatim} + +Use these commands to create a new file and confirm the number of +datafiles that it is using: + +\begin{verbatim} +$ touch /mnt/pvfs2/dir/foo
+$pvfs2-viewdist -f /mnt/pvfs2/dir/foo +dist_name = simple_stripe +dist_params: +strip_size:65536 + +Number of datafiles/servers = 1 +Server 0 - tcp://localhost:3334, handle: 5223372036854744173 +(487d2531626f846d.bstream) +\end{verbatim} + +\section{Distributions} + +A \emph{distribution} is an algorithm that defines how a file's data +will be distributed among available servers. PVFS provides an API +for implementing arbitrary distributions, but four specific ones are +available by default. Each distribution has different performance +characteristics that may be helpful depending on your workload. + +The most straightforward way to use an alternative distribution is by +setting an extended attribute on a specific directory. Any new files +created in that directory will utilize the specified distribution and +parameters. New subdirectories will inherit the same settings. You may +also use the server configuration files to define default distribution +settings for the entire file system, but we suggest experimenting at a +directory level first. + +PVFS does not allow the distribution to be changed dynamically for +existing files. If you wish to convert an existing file, then you must +copy it to a new file with the appropriate distribution and then delete +the old file. + +This section describes the four available distributions and gives +command line examples of how to use each one. + +% TODO: some figures would be spiffy (but time consuming) + +\subsection{Simple Stripe} + +The simple stripe distribution is the default distribution used by PVFS. +It dictates that file data will be striped evenly across all available +servers in a round robin fashion. This is very similar to RAID 0, +except the data is distributed across servers rather than local block +devices. + +The only tuning parameter within simple stripe is the \emph{strip size}. +The strip size determines how much data is stored on one server before +switching to the next server. The default value in PVFS is 64 KB. You +may want to experiment with this value in order to find a tradeoff +between the amount of concurrency achieved with small accesses vs. the +amount of data streamed to each server. + +\begin{verbatim} +# to enable simple stripe distribution for a directory: +$ setfattr -n user.pvfs2.dist_name -v simple_stripe /mnt/pvfs2/dir
+
+# to change the strip size to 128 KB:
+$setfattr -n user.pvfs2.dist_params -v strip_size:131072 /mnt/pvfs2/dir + +# to create a new file and confirm the distribution: +$ touch /mnt/pvfs2/dir/file
+$pvfs2-viewdist -f /mnt/pvfs2/dir/file +dist_name = simple_stripe +dist_params: +strip_size:131072 + +Number of datafiles/servers = 4 +Server 0 - tcp://localhost:3337, handle: 8223372036854744180 (721f494c589b8474.bstream) +Server 1 - tcp://localhost:3334, handle: 5223372036854744174 (487d2531626f846e.bstream) +Server 2 - tcp://localhost:3335, handle: 6223372036854744178 (565ddbe509d38472.bstream) +Server 3 - tcp://localhost:3336, handle: 7223372036854744180 (643e9298b1378474.bstream) +\end{verbatim} + +\subsection{Basic} + +The basic distribution is mainly included as an example for distribution +developers. It performs no striping at all, and instead places all +data on one server. The basic distribution overrides the number of +datafiles (as shown in the previous section) and only uses one datafile +in all cases. There are no tunable parameters. + +\begin{verbatim} +# to enable basic distribution for a directory: +$ setfattr -n user.pvfs2.dist_name -v basic_dist /mnt/pvfs2/dir
+
+# to create a new file and confirm the distribution:
+$touch /mnt/pvfs2/dir/file +$ pvfs2-viewdist -f /mnt/pvfs2/dir/file
+dist_name = basic_dist
+dist_params:
+none
+Number of datafiles/servers = 1
+Server 0 - tcp://localhost:3334, handle: 5223372036854744172 (487d2531626f846c.bstream)
+\end{verbatim}
+
+
+\subsection{Two Dimensional Stripe}
+
+The two dimensional stripe distribution is a variation of the
+simple stripe distribution that is intended to combat the affects of
+\emph{incast}.  Incast occurs when a client requests a range of data that
+is striped across several servers, therefore causing the transmission of
+data from many sources to one destination simultaneously.  Some networks
+or network protocols may perform poorly in this scenario due to switch
+buffering or congestion avoidance.  This problem becomes more
+significant as more servers are used.
+
+The two dimensional stripe distribution operates by grouping servers
+into smaller subsets and striping data within each group multiple times
+before switching.  Three parameters control the grouping and
+striping:

\begin{itemize}
-\item Number of Datafiles
-\item Stripe Size
-\item Distribution
+\item strip size: same as in the simple stripe distribution
+\item number of groups: how many groups to divide the servers into
+\item factor: how many times to stripe within each group
\end{itemize}
+
+The common access pattern that benefits from this distribution is the
+case of N clients operating on one file of size B, where each client is
+responsible for a contiguous region of size B/N.  With simple striping,
+each client may have to access all servers in order to read its specific
+B/N byte range.  With two dimensional striping (using appropriate
+parameters), each client only accesses a subset of servers.  All servers
+are still active over the file as a whole so that full bandwidth is
+still preserved.  However, the network traffic patterns limit the amount
+of incast produced by any one client.
+
+The default incast parameters use a strip size of 64 KB, 2 groups, and a
+factor of 256.
+
+\begin{verbatim}
+# to enable basic distribution for a directory:
+$setfattr -n user.pvfs2.dist_name -v twod_stripe /mnt/pvfs2/dir + +# to change the strip size to 128 KB, the number of groups to 4, and a +# factor of 228: +$ setfattr -n user.pvfs2.dist_params -v strip_size:131072,num_groups:4,group_strip_factor:128 /mnt/pvfs2/dir
+
+# to create a new file and confirm the distribution:
+$touch /mnt/pvfs2/dir/file +$ pvfs2-viewdist -f /mnt/pvfs2/dir/file
+dist_name = twod_stripe
+dist_params:
+num_groups:4,strip_size:131072,factor:128
+
+Number of datafiles/servers = 4
+Server 0 - tcp://localhost:3336, handle: 7223372036854744175
+(643e9298b137846f.bstream)
+Server 1 - tcp://localhost:3337, handle: 8223372036854744175
+(721f494c589b846f.bstream)
+Server 2 - tcp://localhost:3334, handle: 5223372036854744167
+(487d2531626f8467.bstream)
+Server 3 - tcp://localhost:3335, handle: 6223372036854744173
+(565ddbe509d3846d.bstream)
+\end{verbatim}
+
+\subsection{Variable Strip}
+
+Variable strip is similar to simple stripe, except that it allows you to
+specify a different strip size for each server.  For example, you could
+place the first 64 KB on one server, the next 32 KB on the next server,
+and then 128 KB on the final server.  The striping still round robins
+once all servers have been filled.  This distribution may be useful for
+applications that have a very specific data format and can take
+advantage of a correspondingly specific placement of file data to match it.
+
+The only parameter used by the variable strip distribution is the
+\emph{strips} parameter, which specifies the strip size for each server.
+The number of datafiles used will not exceed the number of servers
+listed.  For example, if the strips parameter specifies a strip size for
+three servers, then files using this distribution will have at most
+three datafiles.
+
+The format of the strips parameter is a list of semicolon separated
+$<server>:<strip size>$ pairs.  The strip size can be specified with short hand
+notation, such as K'' for kilobytes or M'' for megabytes.
+
+\begin{verbatim}
+
+# to enable basic distribution for a directory:
+$setfattr -n user.pvfs2.dist_name -v varstrip_dist /mnt/pvfs2/dir + +# to change the strip sizes to match the example above: +$ setfattr -n user.pvfs2.dist_params -v "strips:0:32K;1:64K;2:128K" /mnt/pvfs2/dir
+
+# to create a new file and confirm the distribution:
+$touch /mnt/pvfs2/dir/file +$ pvfs2-viewdist -f /mnt/pvfs2/dir/file
+dist_name = varstrip_dist
+dist_params:
+0:32K;1:64K;2:128K
+Number of datafiles/servers = 3
+Server 0 - tcp://localhost:3336, handle: 7223372036854744173
+(643e9298b137846d.bstream)
+Server 1 - tcp://localhost:3334, handle: 5223372036854744165
+(487d2531626f8465.bstream)
+Server 2 - tcp://localhost:3335, handle: 6223372036854744171
+(565ddbe509d3846b.bstream)
+\end{verbatim}