[PVFS-developers] Re: [PVFS-users] Recompile pvfs module for SuSE 2.4.19-NUMA

Claude Pignol cpignol at seismiccity.com
Mon Mar 8 16:13:46 EST 2004


Rob,


I/O 64KB no problem
I/O 128KB no problem
I/O 256KB write no problem and read 10 times slower.
The tuning of the parameters helps to get a better performance when it 
works normally,
but with the I/O of 256K pvfs doesn't behave normally.
The current parameters are
r(w)mem_max 1048575
write_buf 4096
access_size 4096
socket_buf 1024
No error message in the pvfs log

Disks: raid disk that can deliver 30MB/s
Dedicated to pvfs data

Regards
Claude





Rob Ross wrote:

>On Mon, 8 Mar 2004, Claude Pignol wrote:
>
>  
>
>>Rob Ross wrote:
>>
>>    
>>
>>>Oh, I misunderstood what you were saying before.  I thought that the "few 
>>>MB" was your file size, not your access size.
>>> 
>>>
>>>      
>>>
>>The problem is the I/O size not the file size.
>>
>>    
>>
>>>How many I/O servers do you have in the system?  How much memory do you 
>>>have in your client?
>>> 
>>>
>>>      
>>>
>>10 I/O servers 1GB (dedicated ffor iod)
>>    
>>
>
>Clients have this much RAM too?
>
>  
>
>>>These four /proc values are the default and maximum socket buffer sizes, 
>>>if I understand things correctly:
>>> /proc/sys/net/core/rmem_default
>>> /proc/sys/net/core/rmem_max
>>> /proc/sys/net/core/wmem_default
>>> /proc/sys/net/core/wmem_max
>>> 
>>>
>>>      
>>>
>>r(w)mem_default is 65535
>>r(w)mem_max is 131071
>>    
>>
>
>I would adjust these up significantly.  I've seen suggestions of as much 
>as 8MB for wide area; maybe try 1MB and see how that goes?  We're much 
>nicer about socket usage now, so it shouldn't be too much of a resource 
>hog.
>
>I don't think the client adjusts these, so it's going to use the default.  
>The iod *does* adjust these -- see below.
>
>  
>
>>>Also, you might want to adjust the following in your iod.conf file (see 
>>>man pages for details): socket_buf, access_size.
>>> 
>>>
>>>      
>>>
>>write_buf 512
>>access_size 512
>>socket_buf 64
>>    
>>
>
>I would adjust access_size up to some multiple of the new wmem_max so that 
>there is a large enough memory mapped region to fill the buffer with one 
>send.  Likewise for write_buf.
>
>I would adjust socket_buf to be the same as r(w)mem_max, because that is 
>what the iod will use.
>
>  
>
>>>About where does the dropoff start to occur?
>>> 
>>>
>>>      
>>>
>>I/O size of 256KB
>>
>>The read rate is around 4MB/s for I/O of 1024K
>>
>>Thanks
>>Claude
>>    
>>
>
>Let me know if this helps.  Also, as a kick-start for the next stage, what 
>sort of storage do you have on those nodes (single disks, SW RAID, FC 
>attached, ...)?
>
>Thanks,
>
>Rob
>
>  
>
>>>Regards,
>>>
>>>Rob
>>>
>>>On Mon, 8 Mar 2004, Claude Pignol wrote:
>>>
>>> 
>>>
>>>      
>>>
>>>>Thanks Rob,
>>>>
>>>>Another fact:
>>>>I found that the read works very well with 64K I/O: the read speed is 
>>>>better than the write speed.
>>>>The read perf start degrading when I increase the I/O size
>>>>
>>>>I agree that there is a starting cost but there is the read ahead mechanism
>>>>that speed up the disk access.
>>>>I am testing with file of min 1GB
>>>>
>>>>I have tested with dynamic buffering (the default) and the static buffering.
>>>>Same problem.
>>>>How do you increase tcp buffer size?
>>>>net.ipv4.tcp_rmem
>>>>net.ipv4.tcp_wmem
>>>>net.ipv4.tcp_mem
>>>>
>>>>
>>>>Claude
>>>>
>>>>
>>>>Rob Ross wrote:
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>Hi Claude,
>>>>>
>>>>>Sorry we didn't get back to you sooner.  I'm glad that the kernel update 
>>>>>fixed the problem.
>>>>>
>>>>>What block size (bs=XXX) are you using in your tests?
>>>>>
>>>>>Note that when reading no I/O can start until data is read off disk, while 
>>>>>in the write case data can start moving right away.  So you may just be 
>>>>>seeing startup costs.
>>>>>
>>>>>You could look at increasing TCP buffer sizes on your system as a first 
>>>>>step.
>>>>>
>>>>>Regards,
>>>>>
>>>>>Rob
>>>>>
>>>>>On Mon, 8 Mar 2004, Claude Pignol wrote:
>>>>>
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>Greetings,
>>>>>>
>>>>>>An upgrade to 2.4.21 fixes the problem.
>>>>>>Compile and start OK.
>>>>>>I have noticed a performance problem in reading from PVFS.
>>>>>>With big I/O (few MB) reading is around 1/3 of the performance of writing.
>>>>>>Pvfs deamons with default parameters
>>>>>>Reading/Writing from on node to pvfs using dd.
>>>>>>I have verified the disk performance of all the 10 I/O nodes
>>>>>>I have also verified the network perf to all the nodes.
>>>>>>What is the best strategy/tools to address this kind of problem?
>>>>>>Thanks
>>>>>>
>>>>>>
>>>>>>Claude Pignol wrote:
>>>>>>
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>Greetings,
>>>>>>>
>>>>>>>I try to do a benchmark of pvfs with the SuSE 2.4.19-NUMA kernel
>>>>>>>to compare with the SuSE 2.4.19-SMP kernel.
>>>>>>>No problem to compile and load the pvfs.o module with the SMP kernel
>>>>>>>
>>>>>>>With the NUMA kernel I get 3 undefined symbols when I try to load the 
>>>>>>>module
>>>>>>>pvfs.o: unresolved symbol __pollwait
>>>>>>>pvfs.o: unresolved symbol mem_map
>>>>>>>pvfs.o: unresolved symbol iget4
>>>>>>>
>>>>>>>The kernel source is installed.
>>>>>>>Any idea?
>>>>>>>Thanks in advance
>>>>>>>Claude
>>>>>>>
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>PVFS-users mailing list
>>>>>>>PVFS-users at www.beowulf-underground.org
>>>>>>>http://www.beowulf-underground.org/mailman/listinfo/pvfs-users
>>>>>>>
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>_______________________________________________
>>>>>>PVFS-developers mailing list
>>>>>>PVFS-developers at www.beowulf-underground.org
>>>>>>http://www.beowulf-underground.org/mailman/listinfo/pvfs-developers
>>>>>>
>>>>>>
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>-- 
>>>>
>>>>
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>> 
>>>
>>>      
>>>
>>-- 
>>
>>
>>
>>    
>>
>
>_______________________________________________
>PVFS-developers mailing list
>PVFS-developers at www.beowulf-underground.org
>http://www.beowulf-underground.org/mailman/listinfo/pvfs-developers
>
>  
>








More information about the PVFS-developers mailing list