[Pvfs2-users] I'm in the dark, need someone to shed some light

Rob Ross rross at mcs.anl.gov
Tue May 13 18:52:38 EDT 2008


On May 13, 2008, at 4:22 PM, belcampo wrote:
> Rob Ross wrote:
>> Hi Henk,
>> Please be sure to CC pvfs2-users on future emails.
> Sorry stupid I didin't do that.

Easy to do; no worries.

>> Without any additional information, my guess is that every  
>> application you're using in this workflow performs very small I/Os.  
>> These operations are passed into the kernel, back out to pvfs2- 
>> client, across the network and received by the PVFS server, who  
>> then performs I/O on your application's behalf. If operations are  
>> particularly small, this can be a lot of overhead.
> Top tells me that server-side 99.6% idle client-side 95% idle, how  
> could I determine what is causing the abnormal delays.
> Starting to play a dvd takes about 12 secs. After a few seconds it  
> starts stuttering.
> By nfs of 1 of the servers it takes about 1 sec to start and never  
> stutters.

Heh well that's a little different -- that's a read workload. The NFS  
client is reading ahead.

Have a look at this:
   http://www.pvfs.org/cvs/pvfs-2-7-branch.build/doc/pvfs2-faq/pvfs2-faq.php#SECTION00074000000000000000
and this:
   http://www.pvfs.org/cvs/pvfs-2-7-branch.build/doc/pvfs2-faq/pvfs2-faq.php#SECTION00077000000000000000

Also this email and specifically the immutable option; you could set  
this on your files after you are done ripping and encoding:
   http://www.beowulf-underground.org/pipermail/pvfs2-developers/2006-September/002688.html

You'd probably want to use the pvfs2-xattr utility to set the  
attribute so you don't have to sudo it.

>> Other networked file systems can hide some of this latency by  
>> caching data (either coherently or not) on the client. PVFS does  
>> not do this, so each little operation goes across the wire.
> Can this be investigated with some networktool and if yes, how ?
>> There's really no advantage to using a parallel file system for the  
>> workload you have described,
> But should the disadvantage be in this order of magnitude ?

Apparently :). You could strace the app to see how big/small the IOs  
are. Some apps have options for block sizes for IO that can be used to  
improve performance. Also, there's no reason to bother with striping  
files in this case, since you're accessing serially. You should set  
the the number of datafiles (objects holding data) to 1 on the  
directory you're storing into:
   setfattr -n "user.pvfs2.num_dfiles" -v "1" /mnt/pvfs2/directory

>> unless you're planning on having a lot of systems doing this  
>> process in parallel and want a single place to store the output.
>> What sort of network do you have in this system? What sort of nodes  
>> are you using for the PVFS servers?
> All AMD 4000+ systems with 1Gb networkcards and 320GB disk in each  
> or them.
> Copying from to clients to these 3 servers is > 100MB/sec pretty  
> close to what Gb ethernet can do.

In what context do you get that performance? How do tools like pvfs2- 
cp compare in performance?

Rob


More information about the Pvfs2-users mailing list