[Pvfs2-developers] PVFS2 Performance Problem -
David Bonnie
dbonnie at clemson.edu
Wed Jul 1 18:45:04 EDT 2009
Sam -
Here's the output from netpipe between one client and one server:
[root at lps-246 bin]# ./nplaunch ../NPtcp -h lps-246-compute-2
../NPtcp -h lps-246-compute-2
[1] 4534
Send and receive buffers are 16384 and 87380 bytes
(A bug in Linux doubles the requested buffer sizes)
Send and receive buffers are 16384 and 87380 bytes
(A bug in Linux doubles the requested buffer sizes)
Now starting the main loop
0: 1 bytes 1964 times --> 0.15 Mbps in 51.87 usec
1: 2 bytes 1927 times --> 0.29 Mbps in 51.95 usec
2: 3 bytes 1924 times --> 0.44 Mbps in 51.93 usec
3: 4 bytes 1283 times --> 0.59 Mbps in 51.84 usec
4: 6 bytes 1446 times --> 0.88 Mbps in 51.87 usec
5: 8 bytes 964 times --> 1.18 Mbps in 51.89 usec
6: 12 bytes 1204 times --> 1.76 Mbps in 51.88 usec
7: 13 bytes 803 times --> 1.91 Mbps in 51.79 usec
8: 16 bytes 891 times --> 2.35 Mbps in 52.02 usec
9: 19 bytes 1081 times --> 2.79 Mbps in 52.01 usec
10: 21 bytes 1214 times --> 3.07 Mbps in 52.13 usec
11: 24 bytes 1278 times --> 3.52 Mbps in 52.01 usec
12: 27 bytes 1361 times --> 3.96 Mbps in 52.04 usec
13: 29 bytes 853 times --> 4.25 Mbps in 52.04 usec
14: 32 bytes 927 times --> 4.69 Mbps in 52.07 usec
15: 35 bytes 1020 times --> 5.14 Mbps in 52.00 usec
16: 45 bytes 1098 times --> 6.58 Mbps in 52.20 usec
17: 48 bytes 1277 times --> 7.01 Mbps in 52.21 usec
18: 51 bytes 1316 times --> 7.47 Mbps in 52.11 usec
19: 61 bytes 752 times --> 8.91 Mbps in 52.23 usec
20: 64 bytes 941 times --> 9.34 Mbps in 52.30 usec
21: 67 bytes 985 times --> 9.78 Mbps in 52.26 usec
22: 93 bytes 1028 times --> 13.50 Mbps in 52.57 usec
23: 96 bytes 1268 times --> 13.93 Mbps in 52.56 usec
24: 99 bytes 1288 times --> 14.37 Mbps in 52.57 usec
25: 125 bytes 691 times --> 18.13 Mbps in 52.59 usec
26: 128 bytes 943 times --> 18.54 Mbps in 52.68 usec
27: 131 bytes 963 times --> 18.97 Mbps in 52.69 usec
28: 189 bytes 985 times --> 26.85 Mbps in 53.70 usec
29: 192 bytes 1241 times --> 27.16 Mbps in 53.94 usec
30: 195 bytes 1245 times --> 27.40 Mbps in 54.30 usec
31: 253 bytes 642 times --> 31.53 Mbps in 61.23 usec
32: 256 bytes 813 times --> 31.89 Mbps in 61.25 usec
33: 259 bytes 822 times --> 32.82 Mbps in 60.20 usec
34: 381 bytes 846 times --> 43.84 Mbps in 66.31 usec
35: 384 bytes 1005 times --> 44.14 Mbps in 66.37 usec
36: 387 bytes 1008 times --> 44.35 Mbps in 66.57 usec
37: 509 bytes 512 times --> 55.88 Mbps in 69.49 usec
38: 512 bytes 718 times --> 56.08 Mbps in 69.66 usec
39: 515 bytes 720 times --> 56.42 Mbps in 69.64 usec
40: 765 bytes 724 times --> 77.61 Mbps in 75.20 usec
41: 768 bytes 886 times --> 77.89 Mbps in 75.22 usec
42: 771 bytes 887 times --> 78.17 Mbps in 75.25 usec
43: 1021 bytes 448 times --> 95.64 Mbps in 81.45 usec
44: 1024 bytes 613 times --> 96.04 Mbps in 81.35 usec
45: 1027 bytes 615 times --> 96.29 Mbps in 81.37 usec
46: 1533 bytes 617 times --> 118.90 Mbps in 98.37 usec
47: 1536 bytes 677 times --> 118.75 Mbps in 98.68 usec
48: 1539 bytes 676 times --> 119.00 Mbps in 98.67 usec
49: 2045 bytes 339 times --> 153.16 Mbps in 101.87 usec
50: 2048 bytes 490 times --> 152.82 Mbps in 102.25 usec
51: 2051 bytes 489 times --> 153.41 Mbps in 102.00 usec
52: 3069 bytes 491 times --> 195.25 Mbps in 119.92 usec
53: 3072 bytes 555 times --> 195.44 Mbps in 119.92 usec
54: 3075 bytes 556 times --> 196.04 Mbps in 119.67 usec
55: 4093 bytes 279 times --> 241.11 Mbps in 129.52 usec
56: 4096 bytes 385 times --> 241.18 Mbps in 129.57 usec
57: 4099 bytes 386 times --> 241.85 Mbps in 129.31 usec
58: 6141 bytes 387 times --> 313.92 Mbps in 149.25 usec
59: 6144 bytes 446 times --> 313.39 Mbps in 149.57 usec
60: 6147 bytes 445 times --> 313.58 Mbps in 149.55 usec
61: 8189 bytes 223 times --> 376.78 Mbps in 165.82 usec
62: 8192 bytes 301 times --> 376.76 Mbps in 165.89 usec
63: 8195 bytes 301 times --> 377.01 Mbps in 165.84 usec
64: 12285 bytes 301 times --> 466.20 Mbps in 201.04 usec
65: 12288 bytes 331 times --> 467.01 Mbps in 200.75 usec
66: 12291 bytes 332 times --> 467.81 Mbps in 200.45 usec
67: 16381 bytes 166 times --> 525.68 Mbps in 237.74 usec
68: 16384 bytes 210 times --> 526.26 Mbps in 237.53 usec
69: 16387 bytes 210 times --> 526.45 Mbps in 237.48 usec
70: 24573 bytes 210 times --> 606.69 Mbps in 309.02 usec
71: 24576 bytes 215 times --> 605.94 Mbps in 309.43 usec
72: 24579 bytes 215 times --> 606.69 Mbps in 309.09 usec
73: 32765 bytes 107 times --> 656.41 Mbps in 380.82 usec
74: 32768 bytes 131 times --> 654.14 Mbps in 382.18 usec
75: 32771 bytes 130 times --> 655.71 Mbps in 381.30 usec
76: 49149 bytes 131 times --> 717.66 Mbps in 522.50 usec
77: 49152 bytes 127 times --> 718.85 Mbps in 521.67 usec
78: 49155 bytes 127 times --> 716.82 Mbps in 523.17 usec
79: 65533 bytes 63 times --> 749.16 Mbps in 667.38 usec
80: 65536 bytes 74 times --> 750.34 Mbps in 666.36 usec
81: 65539 bytes 75 times --> 748.70 Mbps in 667.85 usec
82: 98301 bytes 74 times --> 796.11 Mbps in 942.05 usec
83: 98304 bytes 70 times --> 797.44 Mbps in 940.52 usec
84: 98307 bytes 70 times --> 796.58 Mbps in 941.56 usec
85: 131069 bytes 35 times --> 819.79 Mbps in 1219.80 usec
86: 131072 bytes 40 times --> 819.94 Mbps in 1219.60 usec
87: 131075 bytes 40 times --> 820.30 Mbps in 1219.09 usec
88: 196605 bytes 41 times --> 839.50 Mbps in 1786.76 usec
89: 196608 bytes 37 times --> 839.81 Mbps in 1786.12 usec
90: 196611 bytes 37 times --> 840.53 Mbps in 1784.61 usec
91: 262141 bytes 18 times --> 851.70 Mbps in 2348.22 usec
92: 262144 bytes 21 times --> 852.22 Mbps in 2346.81 usec
93: 262147 bytes 21 times --> 852.35 Mbps in 2346.48 usec
94: 393213 bytes 21 times --> 864.02 Mbps in 3472.12 usec
95: 393216 bytes 19 times --> 864.67 Mbps in 3469.55 usec
96: 393219 bytes 19 times --> 863.81 Mbps in 3473.02 usec
97: 524285 bytes 9 times --> 871.33 Mbps in 4590.67 usec
98: 524288 bytes 10 times --> 871.13 Mbps in 4591.75 usec
99: 524291 bytes 10 times --> 871.46 Mbps in 4590.00 usec
100: 786429 bytes 10 times --> 878.64 Mbps in 6828.69 usec
101: 786432 bytes 9 times --> 879.35 Mbps in 6823.22 usec
102: 786435 bytes 9 times --> 879.40 Mbps in 6822.89 usec
103: 1048573 bytes 4 times --> 883.66 Mbps in 9053.23 usec
104: 1048576 bytes 5 times --> 884.31 Mbps in 9046.60 usec
105: 1048579 bytes 5 times --> 884.45 Mbps in 9045.20 usec
106: 1572861 bytes 5 times --> 888.60 Mbps in 13504.41 usec
107: 1572864 bytes 4 times --> 888.71 Mbps in 13502.75 usec
108: 1572867 bytes 4 times --> 888.76 Mbps in 13502.00 usec
109: 2097149 bytes 3 times --> 891.10 Mbps in 17955.34 usec
110: 2097152 bytes 3 times --> 891.30 Mbps in 17951.33 usec
111: 2097155 bytes 3 times --> 891.17 Mbps in 17954.03 usec
112: 3145725 bytes 3 times --> 893.47 Mbps in 26861.51 usec
113: 3145728 bytes 3 times --> 893.33 Mbps in 26865.84 usec
114: 3145731 bytes 3 times --> 893.47 Mbps in 26861.47 usec
115: 4194301 bytes 3 times --> 894.52 Mbps in 35773.16 usec
116: 4194304 bytes 3 times --> 894.50 Mbps in 35774.15 usec
117: 4194307 bytes 3 times --> 894.55 Mbps in 35772.16 usec
118: 6291453 bytes 3 times --> 895.59 Mbps in 53596.18 usec
119: 6291456 bytes 3 times --> 895.64 Mbps in 53593.16 usec
120: 6291459 bytes 3 times --> 895.58 Mbps in 53596.34 usec
121: 8388605 bytes 3 times --> 896.17 Mbps in 71414.67 usec
122: 8388608 bytes 3 times --> 896.18 Mbps in 71413.99 usec
123: 8388611 bytes 3 times --> 896.14 Mbps in 71417.49 usec
I'll run it on each node and let you know if anything is out of place. I
believe the above results are fine for GigE, yes?
- Dave
On Wed, Jul 1, 2009 at 4:20 PM, Sam Lang <slang at mcs.anl.gov> wrote:
>
> David,
> It sounds like your initial thought (that there is a network
> problem) could be correct. I would probably explore that first. What sort
> of numbers do you get from netpipe runs (or even bmi_pingpong) between
> client and server?
>
> -sam
>
> On Jul 1, 2009, at 5:15 PM, David Bonnie wrote:
>
> Sorry for not being clear.
>
> The hardware and software is unchanged. Runs from a few months ago (on
> 2.8.0) performed as expected. Current runs (on both 2.8.0 and 2.8.1) are
> slow.
>
> The nodes are sitting there with very low CPU usage even when running the
> benchmark. I'm the only one running any jobs and there aren't any processes
> running (the system load is < .02 and the cpu usage is pretty much 0%).
>
> The local disks haven't changed and are empty except for the pvfs2 storage
> space; performance is bad even when I put the PVFS2 file system storage onto
> a very fast (>300 MB/s local bandwidth) Atrato vlun connected over fiber
> channel.
>
> My initial thought is that some hardware along the line died but I can't
> seem to pinpoint it. All of the network interfaces show 0 errors and 0
> dropped packets.
>
> - Dave
>
> On Wed, Jul 1, 2009 at 4:10 PM, Rob Ross <rross at mcs.anl.gov> wrote:
>
>> Hi David,
>>
>> I still don't get it: when was the performance good? Same software and
>> hardware, just some time in the past? Or is there a software change?
>>
>> The nodes aren't being used for anything else, there are no rogue
>> processes, and the local file systems are otherwise empty?
>>
>> Thanks,
>>
>> Rob
>>
>> On Jul 1, 2009, at 5:05 PM, David Bonnie wrote:
>>
>> Rob -
>>>
>>> Performance is down across all PVFS2 installations. The benchmark simply
>>> creates files of a random size (between 1 and 25 MB) in a single folder on
>>> the mounted PVFS2 partition, 16 KB at a time. It's not anywhere near ideal,
>>> but it's the workload I'm working with.
>>>
>>> Prior to this problem we were getting ~22 MB/s write throughput and we're
>>> down to about 2.5 MB/s for no apparent reason. Reads are down from about 55
>>> MB/s to 30 MB/s. No hardware has changed and as far as I can tell no
>>> hardware has died either.
>>>
>>> - Dave
>>>
>>>
>>> On Wed, Jul 1, 2009 at 4:00 PM, Rob Ross <rross at mcs.anl.gov> wrote:
>>> Do you mean that 2.8.0 is fast and 2.8.1 is slow? Can you describe the
>>> benchmark and how you are doing your measurements?
>>>
>>> Rob
>>>
>>>
>>> On Jul 1, 2009, at 4:43 PM, David Bonnie wrote:
>>>
>>> Hello all -
>>>
>>> I'm having trouble figuring out a problem with performance depredation on
>>> a simple 10 node cluster. Prior runs on the cluster (before this problem
>>> manifested itself) resulted in bandwidth and IOPS about 10 times higher on a
>>> small file creation workload. Each node is running as a metadata server and
>>> a data server.
>>>
>>> The problem is persistent between versions and installations of PVFS2
>>> 2.8.0 and 2.8.1. Rebooting all of the nodes didn't improve anything. The
>>> network connections (simple GigE) showed no errors or dropped packets.
>>> Using different physical disks (both SAS and FC) didn't improve things.
>>> The kernel logs didn't show anything out of place nor did the pvfs2 server
>>> or client logs. It seems like a network issue but I can't seem to find
>>> anything wrong with any of the connections.
>>>
>>> Has anyone seen this kind of problem before? I seem to remember
>>> something on the list before about performance suddenly dropping but I can't
>>> find the message now (of course). Any insight would be appreciated!
>>>
>>> Thanks,
>>>
>>> - Dave
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> Pvfs2-developers at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>
>>>
>>>
>>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
>
>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20090701/e8c69f55/attachment-0001.htm
More information about the Pvfs2-developers
mailing list