[PVFS-developers] Re: [PVFS-users] Recompile pvfs module for SuSE 2.4.19-NUMA

Rob Ross rross at mcs.anl.gov
Mon Mar 8 17:32:34 EST 2004


Hey,

What's your strip size default?

So adjusting those parameters did have a positive effect for many cases, 
but the 256KB read case is still bad?

Is it consistently bad for ever-larger sizes, or is that particular size a 
bad one?

Thanks,

Rob

On Mon, 8 Mar 2004, Claude Pignol wrote:

> Rob,
> 
> 
> I/O 64KB no problem
> I/O 128KB no problem
> I/O 256KB write no problem and read 10 times slower.
> The tuning of the parameters helps to get a better performance when it 
> works normally,
> but with the I/O of 256K pvfs doesn't behave normally.
> The current parameters are
> r(w)mem_max 1048575
> write_buf 4096
> access_size 4096
> socket_buf 1024
> No error message in the pvfs log
> 
> Disks: raid disk that can deliver 30MB/s
> Dedicated to pvfs data
> 
> Regards
> Claude
> 
> 
> 
> 
> 
> Rob Ross wrote:
> 
> >On Mon, 8 Mar 2004, Claude Pignol wrote:
> >
> >  
> >
> >>Rob Ross wrote:
> >>
> >>    
> >>
> >>>Oh, I misunderstood what you were saying before.  I thought that the "few 
> >>>MB" was your file size, not your access size.
> >>> 
> >>>
> >>>      
> >>>
> >>The problem is the I/O size not the file size.
> >>
> >>    
> >>
> >>>How many I/O servers do you have in the system?  How much memory do you 
> >>>have in your client?
> >>> 
> >>>
> >>>      
> >>>
> >>10 I/O servers 1GB (dedicated ffor iod)
> >>    
> >>
> >
> >Clients have this much RAM too?
> >
> >  
> >
> >>>These four /proc values are the default and maximum socket buffer sizes, 
> >>>if I understand things correctly:
> >>> /proc/sys/net/core/rmem_default
> >>> /proc/sys/net/core/rmem_max
> >>> /proc/sys/net/core/wmem_default
> >>> /proc/sys/net/core/wmem_max
> >>> 
> >>>
> >>>      
> >>>
> >>r(w)mem_default is 65535
> >>r(w)mem_max is 131071
> >>    
> >>
> >
> >I would adjust these up significantly.  I've seen suggestions of as much 
> >as 8MB for wide area; maybe try 1MB and see how that goes?  We're much 
> >nicer about socket usage now, so it shouldn't be too much of a resource 
> >hog.
> >
> >I don't think the client adjusts these, so it's going to use the default.  
> >The iod *does* adjust these -- see below.
> >
> >  
> >
> >>>Also, you might want to adjust the following in your iod.conf file (see 
> >>>man pages for details): socket_buf, access_size.
> >>> 
> >>>
> >>>      
> >>>
> >>write_buf 512
> >>access_size 512
> >>socket_buf 64
> >>    
> >>
> >
> >I would adjust access_size up to some multiple of the new wmem_max so that 
> >there is a large enough memory mapped region to fill the buffer with one 
> >send.  Likewise for write_buf.
> >
> >I would adjust socket_buf to be the same as r(w)mem_max, because that is 
> >what the iod will use.
> >
> >  
> >
> >>>About where does the dropoff start to occur?
> >>> 
> >>>
> >>>      
> >>>
> >>I/O size of 256KB
> >>
> >>The read rate is around 4MB/s for I/O of 1024K
> >>
> >>Thanks
> >>Claude
> >>    
> >>
> >
> >Let me know if this helps.  Also, as a kick-start for the next stage, what 
> >sort of storage do you have on those nodes (single disks, SW RAID, FC 
> >attached, ...)?
> >
> >Thanks,
> >
> >Rob
> >
> >  
> >
> >>>Regards,
> >>>
> >>>Rob
> >>>
> >>>On Mon, 8 Mar 2004, Claude Pignol wrote:
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>>>Thanks Rob,
> >>>>
> >>>>Another fact:
> >>>>I found that the read works very well with 64K I/O: the read speed is 
> >>>>better than the write speed.
> >>>>The read perf start degrading when I increase the I/O size
> >>>>
> >>>>I agree that there is a starting cost but there is the read ahead mechanism
> >>>>that speed up the disk access.
> >>>>I am testing with file of min 1GB
> >>>>
> >>>>I have tested with dynamic buffering (the default) and the static buffering.
> >>>>Same problem.
> >>>>How do you increase tcp buffer size?
> >>>>net.ipv4.tcp_rmem
> >>>>net.ipv4.tcp_wmem
> >>>>net.ipv4.tcp_mem
> >>>>
> >>>>
> >>>>Claude
> >>>>
> >>>>
> >>>>Rob Ross wrote:
> >>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>Hi Claude,
> >>>>>
> >>>>>Sorry we didn't get back to you sooner.  I'm glad that the kernel update 
> >>>>>fixed the problem.
> >>>>>
> >>>>>What block size (bs=XXX) are you using in your tests?
> >>>>>
> >>>>>Note that when reading no I/O can start until data is read off disk, while 
> >>>>>in the write case data can start moving right away.  So you may just be 
> >>>>>seeing startup costs.
> >>>>>
> >>>>>You could look at increasing TCP buffer sizes on your system as a first 
> >>>>>step.
> >>>>>
> >>>>>Regards,
> >>>>>
> >>>>>Rob
> >>>>>
> >>>>>On Mon, 8 Mar 2004, Claude Pignol wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>>>Greetings,
> >>>>>>
> >>>>>>An upgrade to 2.4.21 fixes the problem.
> >>>>>>Compile and start OK.
> >>>>>>I have noticed a performance problem in reading from PVFS.
> >>>>>>With big I/O (few MB) reading is around 1/3 of the performance of writing.
> >>>>>>Pvfs deamons with default parameters
> >>>>>>Reading/Writing from on node to pvfs using dd.
> >>>>>>I have verified the disk performance of all the 10 I/O nodes
> >>>>>>I have also verified the network perf to all the nodes.
> >>>>>>What is the best strategy/tools to address this kind of problem?
> >>>>>>Thanks
> >>>>>>
> >>>>>>
> >>>>>>Claude Pignol wrote:
> >>>>>>
> >>>>>>  
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>>>Greetings,
> >>>>>>>
> >>>>>>>I try to do a benchmark of pvfs with the SuSE 2.4.19-NUMA kernel
> >>>>>>>to compare with the SuSE 2.4.19-SMP kernel.
> >>>>>>>No problem to compile and load the pvfs.o module with the SMP kernel
> >>>>>>>
> >>>>>>>With the NUMA kernel I get 3 undefined symbols when I try to load the 
> >>>>>>>module
> >>>>>>>pvfs.o: unresolved symbol __pollwait
> >>>>>>>pvfs.o: unresolved symbol mem_map
> >>>>>>>pvfs.o: unresolved symbol iget4
> >>>>>>>
> >>>>>>>The kernel source is installed.
> >>>>>>>Any idea?
> >>>>>>>Thanks in advance
> >>>>>>>Claude
> >>>>>>>
> >>>>>>>
> >>>>>>>_______________________________________________
> >>>>>>>PVFS-users mailing list
> >>>>>>>PVFS-users at www.beowulf-underground.org
> >>>>>>>http://www.beowulf-underground.org/mailman/listinfo/pvfs-users
> >>>>>>>
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>_______________________________________________
> >>>>>>PVFS-developers mailing list
> >>>>>>PVFS-developers at www.beowulf-underground.org
> >>>>>>http://www.beowulf-underground.org/mailman/listinfo/pvfs-developers
> >>>>>>
> >>>>>>
> >>>>>>  
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>-- 
> >>>>
> >>>>
> >>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>> 
> >>>
> >>>      
> >>>
> >>-- 
> >>
> >>
> >>
> >>    
> >>
> >
> >_______________________________________________
> >PVFS-developers mailing list
> >PVFS-developers at www.beowulf-underground.org
> >http://www.beowulf-underground.org/mailman/listinfo/pvfs-developers
> >
> >  
> >
> 
> 
> 
> 
> 
> 
> 



More information about the PVFS-developers mailing list