[PVFS-developers]
Re: [PVFS-users] Recompile pvfs module for SuSE 2.4.19-NUMA
Claude Pignol
cpignol at seismiccity.com
Mon Mar 8 16:13:46 EST 2004
Rob,
I/O 64KB no problem
I/O 128KB no problem
I/O 256KB write no problem and read 10 times slower.
The tuning of the parameters helps to get a better performance when it
works normally,
but with the I/O of 256K pvfs doesn't behave normally.
The current parameters are
r(w)mem_max 1048575
write_buf 4096
access_size 4096
socket_buf 1024
No error message in the pvfs log
Disks: raid disk that can deliver 30MB/s
Dedicated to pvfs data
Regards
Claude
Rob Ross wrote:
>On Mon, 8 Mar 2004, Claude Pignol wrote:
>
>
>
>>Rob Ross wrote:
>>
>>
>>
>>>Oh, I misunderstood what you were saying before. I thought that the "few
>>>MB" was your file size, not your access size.
>>>
>>>
>>>
>>>
>>The problem is the I/O size not the file size.
>>
>>
>>
>>>How many I/O servers do you have in the system? How much memory do you
>>>have in your client?
>>>
>>>
>>>
>>>
>>10 I/O servers 1GB (dedicated ffor iod)
>>
>>
>
>Clients have this much RAM too?
>
>
>
>>>These four /proc values are the default and maximum socket buffer sizes,
>>>if I understand things correctly:
>>> /proc/sys/net/core/rmem_default
>>> /proc/sys/net/core/rmem_max
>>> /proc/sys/net/core/wmem_default
>>> /proc/sys/net/core/wmem_max
>>>
>>>
>>>
>>>
>>r(w)mem_default is 65535
>>r(w)mem_max is 131071
>>
>>
>
>I would adjust these up significantly. I've seen suggestions of as much
>as 8MB for wide area; maybe try 1MB and see how that goes? We're much
>nicer about socket usage now, so it shouldn't be too much of a resource
>hog.
>
>I don't think the client adjusts these, so it's going to use the default.
>The iod *does* adjust these -- see below.
>
>
>
>>>Also, you might want to adjust the following in your iod.conf file (see
>>>man pages for details): socket_buf, access_size.
>>>
>>>
>>>
>>>
>>write_buf 512
>>access_size 512
>>socket_buf 64
>>
>>
>
>I would adjust access_size up to some multiple of the new wmem_max so that
>there is a large enough memory mapped region to fill the buffer with one
>send. Likewise for write_buf.
>
>I would adjust socket_buf to be the same as r(w)mem_max, because that is
>what the iod will use.
>
>
>
>>>About where does the dropoff start to occur?
>>>
>>>
>>>
>>>
>>I/O size of 256KB
>>
>>The read rate is around 4MB/s for I/O of 1024K
>>
>>Thanks
>>Claude
>>
>>
>
>Let me know if this helps. Also, as a kick-start for the next stage, what
>sort of storage do you have on those nodes (single disks, SW RAID, FC
>attached, ...)?
>
>Thanks,
>
>Rob
>
>
>
>>>Regards,
>>>
>>>Rob
>>>
>>>On Mon, 8 Mar 2004, Claude Pignol wrote:
>>>
>>>
>>>
>>>
>>>
>>>>Thanks Rob,
>>>>
>>>>Another fact:
>>>>I found that the read works very well with 64K I/O: the read speed is
>>>>better than the write speed.
>>>>The read perf start degrading when I increase the I/O size
>>>>
>>>>I agree that there is a starting cost but there is the read ahead mechanism
>>>>that speed up the disk access.
>>>>I am testing with file of min 1GB
>>>>
>>>>I have tested with dynamic buffering (the default) and the static buffering.
>>>>Same problem.
>>>>How do you increase tcp buffer size?
>>>>net.ipv4.tcp_rmem
>>>>net.ipv4.tcp_wmem
>>>>net.ipv4.tcp_mem
>>>>
>>>>
>>>>Claude
>>>>
>>>>
>>>>Rob Ross wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>Hi Claude,
>>>>>
>>>>>Sorry we didn't get back to you sooner. I'm glad that the kernel update
>>>>>fixed the problem.
>>>>>
>>>>>What block size (bs=XXX) are you using in your tests?
>>>>>
>>>>>Note that when reading no I/O can start until data is read off disk, while
>>>>>in the write case data can start moving right away. So you may just be
>>>>>seeing startup costs.
>>>>>
>>>>>You could look at increasing TCP buffer sizes on your system as a first
>>>>>step.
>>>>>
>>>>>Regards,
>>>>>
>>>>>Rob
>>>>>
>>>>>On Mon, 8 Mar 2004, Claude Pignol wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Greetings,
>>>>>>
>>>>>>An upgrade to 2.4.21 fixes the problem.
>>>>>>Compile and start OK.
>>>>>>I have noticed a performance problem in reading from PVFS.
>>>>>>With big I/O (few MB) reading is around 1/3 of the performance of writing.
>>>>>>Pvfs deamons with default parameters
>>>>>>Reading/Writing from on node to pvfs using dd.
>>>>>>I have verified the disk performance of all the 10 I/O nodes
>>>>>>I have also verified the network perf to all the nodes.
>>>>>>What is the best strategy/tools to address this kind of problem?
>>>>>>Thanks
>>>>>>
>>>>>>
>>>>>>Claude Pignol wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>Greetings,
>>>>>>>
>>>>>>>I try to do a benchmark of pvfs with the SuSE 2.4.19-NUMA kernel
>>>>>>>to compare with the SuSE 2.4.19-SMP kernel.
>>>>>>>No problem to compile and load the pvfs.o module with the SMP kernel
>>>>>>>
>>>>>>>With the NUMA kernel I get 3 undefined symbols when I try to load the
>>>>>>>module
>>>>>>>pvfs.o: unresolved symbol __pollwait
>>>>>>>pvfs.o: unresolved symbol mem_map
>>>>>>>pvfs.o: unresolved symbol iget4
>>>>>>>
>>>>>>>The kernel source is installed.
>>>>>>>Any idea?
>>>>>>>Thanks in advance
>>>>>>>Claude
>>>>>>>
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>PVFS-users mailing list
>>>>>>>PVFS-users at www.beowulf-underground.org
>>>>>>>http://www.beowulf-underground.org/mailman/listinfo/pvfs-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>_______________________________________________
>>>>>>PVFS-developers mailing list
>>>>>>PVFS-developers at www.beowulf-underground.org
>>>>>>http://www.beowulf-underground.org/mailman/listinfo/pvfs-developers
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>--
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>--
>>
>>
>>
>>
>>
>
>_______________________________________________
>PVFS-developers mailing list
>PVFS-developers at www.beowulf-underground.org
>http://www.beowulf-underground.org/mailman/listinfo/pvfs-developers
>
>
>
More information about the PVFS-developers
mailing list