Sam -<br><br>Here's the output from netpipe between one client and one server:<br><br>[root@lps-246 bin]# ./nplaunch ../NPtcp -h lps-246-compute-2<br><br>../NPtcp -h lps-246-compute-2<br><br>[1] 4534<br>Send and receive buffers are 16384 and 87380 bytes<br>
(A bug in Linux doubles the requested buffer sizes)<br>Send and receive buffers are 16384 and 87380 bytes<br>(A bug in Linux doubles the requested buffer sizes)<br>Now starting the main loop<br> 0: 1 bytes 1964 times --> 0.15 Mbps in 51.87 usec<br>
1: 2 bytes 1927 times --> 0.29 Mbps in 51.95 usec<br> 2: 3 bytes 1924 times --> 0.44 Mbps in 51.93 usec<br> 3: 4 bytes 1283 times --> 0.59 Mbps in 51.84 usec<br>
4: 6 bytes 1446 times --> 0.88 Mbps in 51.87 usec<br> 5: 8 bytes 964 times --> 1.18 Mbps in 51.89 usec<br> 6: 12 bytes 1204 times --> 1.76 Mbps in 51.88 usec<br>
7: 13 bytes 803 times --> 1.91 Mbps in 51.79 usec<br> 8: 16 bytes 891 times --> 2.35 Mbps in 52.02 usec<br> 9: 19 bytes 1081 times --> 2.79 Mbps in 52.01 usec<br>
10: 21 bytes 1214 times --> 3.07 Mbps in 52.13 usec<br> 11: 24 bytes 1278 times --> 3.52 Mbps in 52.01 usec<br> 12: 27 bytes 1361 times --> 3.96 Mbps in 52.04 usec<br>
13: 29 bytes 853 times --> 4.25 Mbps in 52.04 usec<br> 14: 32 bytes 927 times --> 4.69 Mbps in 52.07 usec<br> 15: 35 bytes 1020 times --> 5.14 Mbps in 52.00 usec<br>
16: 45 bytes 1098 times --> 6.58 Mbps in 52.20 usec<br> 17: 48 bytes 1277 times --> 7.01 Mbps in 52.21 usec<br> 18: 51 bytes 1316 times --> 7.47 Mbps in 52.11 usec<br>
19: 61 bytes 752 times --> 8.91 Mbps in 52.23 usec<br> 20: 64 bytes 941 times --> 9.34 Mbps in 52.30 usec<br> 21: 67 bytes 985 times --> 9.78 Mbps in 52.26 usec<br>
22: 93 bytes 1028 times --> 13.50 Mbps in 52.57 usec<br> 23: 96 bytes 1268 times --> 13.93 Mbps in 52.56 usec<br> 24: 99 bytes 1288 times --> 14.37 Mbps in 52.57 usec<br>
25: 125 bytes 691 times --> 18.13 Mbps in 52.59 usec<br> 26: 128 bytes 943 times --> 18.54 Mbps in 52.68 usec<br> 27: 131 bytes 963 times --> 18.97 Mbps in 52.69 usec<br>
28: 189 bytes 985 times --> 26.85 Mbps in 53.70 usec<br> 29: 192 bytes 1241 times --> 27.16 Mbps in 53.94 usec<br> 30: 195 bytes 1245 times --> 27.40 Mbps in 54.30 usec<br>
31: 253 bytes 642 times --> 31.53 Mbps in 61.23 usec<br> 32: 256 bytes 813 times --> 31.89 Mbps in 61.25 usec<br> 33: 259 bytes 822 times --> 32.82 Mbps in 60.20 usec<br>
34: 381 bytes 846 times --> 43.84 Mbps in 66.31 usec<br> 35: 384 bytes 1005 times --> 44.14 Mbps in 66.37 usec<br> 36: 387 bytes 1008 times --> 44.35 Mbps in 66.57 usec<br>
37: 509 bytes 512 times --> 55.88 Mbps in 69.49 usec<br> 38: 512 bytes 718 times --> 56.08 Mbps in 69.66 usec<br> 39: 515 bytes 720 times --> 56.42 Mbps in 69.64 usec<br>
40: 765 bytes 724 times --> 77.61 Mbps in 75.20 usec<br> 41: 768 bytes 886 times --> 77.89 Mbps in 75.22 usec<br> 42: 771 bytes 887 times --> 78.17 Mbps in 75.25 usec<br>
43: 1021 bytes 448 times --> 95.64 Mbps in 81.45 usec<br> 44: 1024 bytes 613 times --> 96.04 Mbps in 81.35 usec<br> 45: 1027 bytes 615 times --> 96.29 Mbps in 81.37 usec<br>
46: 1533 bytes 617 times --> 118.90 Mbps in 98.37 usec<br> 47: 1536 bytes 677 times --> 118.75 Mbps in 98.68 usec<br> 48: 1539 bytes 676 times --> 119.00 Mbps in 98.67 usec<br>
49: 2045 bytes 339 times --> 153.16 Mbps in 101.87 usec<br> 50: 2048 bytes 490 times --> 152.82 Mbps in 102.25 usec<br> 51: 2051 bytes 489 times --> 153.41 Mbps in 102.00 usec<br>
52: 3069 bytes 491 times --> 195.25 Mbps in 119.92 usec<br> 53: 3072 bytes 555 times --> 195.44 Mbps in 119.92 usec<br> 54: 3075 bytes 556 times --> 196.04 Mbps in 119.67 usec<br>
55: 4093 bytes 279 times --> 241.11 Mbps in 129.52 usec<br> 56: 4096 bytes 385 times --> 241.18 Mbps in 129.57 usec<br> 57: 4099 bytes 386 times --> 241.85 Mbps in 129.31 usec<br>
58: 6141 bytes 387 times --> 313.92 Mbps in 149.25 usec<br> 59: 6144 bytes 446 times --> 313.39 Mbps in 149.57 usec<br> 60: 6147 bytes 445 times --> 313.58 Mbps in 149.55 usec<br>
61: 8189 bytes 223 times --> 376.78 Mbps in 165.82 usec<br> 62: 8192 bytes 301 times --> 376.76 Mbps in 165.89 usec<br> 63: 8195 bytes 301 times --> 377.01 Mbps in 165.84 usec<br>
64: 12285 bytes 301 times --> 466.20 Mbps in 201.04 usec<br> 65: 12288 bytes 331 times --> 467.01 Mbps in 200.75 usec<br> 66: 12291 bytes 332 times --> 467.81 Mbps in 200.45 usec<br>
67: 16381 bytes 166 times --> 525.68 Mbps in 237.74 usec<br> 68: 16384 bytes 210 times --> 526.26 Mbps in 237.53 usec<br> 69: 16387 bytes 210 times --> 526.45 Mbps in 237.48 usec<br>
70: 24573 bytes 210 times --> 606.69 Mbps in 309.02 usec<br> 71: 24576 bytes 215 times --> 605.94 Mbps in 309.43 usec<br> 72: 24579 bytes 215 times --> 606.69 Mbps in 309.09 usec<br>
73: 32765 bytes 107 times --> 656.41 Mbps in 380.82 usec<br> 74: 32768 bytes 131 times --> 654.14 Mbps in 382.18 usec<br> 75: 32771 bytes 130 times --> 655.71 Mbps in 381.30 usec<br>
76: 49149 bytes 131 times --> 717.66 Mbps in 522.50 usec<br> 77: 49152 bytes 127 times --> 718.85 Mbps in 521.67 usec<br> 78: 49155 bytes 127 times --> 716.82 Mbps in 523.17 usec<br>
79: 65533 bytes 63 times --> 749.16 Mbps in 667.38 usec<br> 80: 65536 bytes 74 times --> 750.34 Mbps in 666.36 usec<br> 81: 65539 bytes 75 times --> 748.70 Mbps in 667.85 usec<br>
82: 98301 bytes 74 times --> 796.11 Mbps in 942.05 usec<br> 83: 98304 bytes 70 times --> 797.44 Mbps in 940.52 usec<br> 84: 98307 bytes 70 times --> 796.58 Mbps in 941.56 usec<br>
85: 131069 bytes 35 times --> 819.79 Mbps in 1219.80 usec<br> 86: 131072 bytes 40 times --> 819.94 Mbps in 1219.60 usec<br> 87: 131075 bytes 40 times --> 820.30 Mbps in 1219.09 usec<br>
88: 196605 bytes 41 times --> 839.50 Mbps in 1786.76 usec<br> 89: 196608 bytes 37 times --> 839.81 Mbps in 1786.12 usec<br> 90: 196611 bytes 37 times --> 840.53 Mbps in 1784.61 usec<br>
91: 262141 bytes 18 times --> 851.70 Mbps in 2348.22 usec<br> 92: 262144 bytes 21 times --> 852.22 Mbps in 2346.81 usec<br> 93: 262147 bytes 21 times --> 852.35 Mbps in 2346.48 usec<br>
94: 393213 bytes 21 times --> 864.02 Mbps in 3472.12 usec<br> 95: 393216 bytes 19 times --> 864.67 Mbps in 3469.55 usec<br> 96: 393219 bytes 19 times --> 863.81 Mbps in 3473.02 usec<br>
97: 524285 bytes 9 times --> 871.33 Mbps in 4590.67 usec<br> 98: 524288 bytes 10 times --> 871.13 Mbps in 4591.75 usec<br> 99: 524291 bytes 10 times --> 871.46 Mbps in 4590.00 usec<br>
100: 786429 bytes 10 times --> 878.64 Mbps in 6828.69 usec<br>101: 786432 bytes 9 times --> 879.35 Mbps in 6823.22 usec<br>102: 786435 bytes 9 times --> 879.40 Mbps in 6822.89 usec<br>
103: 1048573 bytes 4 times --> 883.66 Mbps in 9053.23 usec<br>104: 1048576 bytes 5 times --> 884.31 Mbps in 9046.60 usec<br>105: 1048579 bytes 5 times --> 884.45 Mbps in 9045.20 usec<br>
106: 1572861 bytes 5 times --> 888.60 Mbps in 13504.41 usec<br>107: 1572864 bytes 4 times --> 888.71 Mbps in 13502.75 usec<br>108: 1572867 bytes 4 times --> 888.76 Mbps in 13502.00 usec<br>
109: 2097149 bytes 3 times --> 891.10 Mbps in 17955.34 usec<br>110: 2097152 bytes 3 times --> 891.30 Mbps in 17951.33 usec<br>111: 2097155 bytes 3 times --> 891.17 Mbps in 17954.03 usec<br>
112: 3145725 bytes 3 times --> 893.47 Mbps in 26861.51 usec<br>113: 3145728 bytes 3 times --> 893.33 Mbps in 26865.84 usec<br>114: 3145731 bytes 3 times --> 893.47 Mbps in 26861.47 usec<br>
115: 4194301 bytes 3 times --> 894.52 Mbps in 35773.16 usec<br>116: 4194304 bytes 3 times --> 894.50 Mbps in 35774.15 usec<br>117: 4194307 bytes 3 times --> 894.55 Mbps in 35772.16 usec<br>
118: 6291453 bytes 3 times --> 895.59 Mbps in 53596.18 usec<br>119: 6291456 bytes 3 times --> 895.64 Mbps in 53593.16 usec<br>120: 6291459 bytes 3 times --> 895.58 Mbps in 53596.34 usec<br>
121: 8388605 bytes 3 times --> 896.17 Mbps in 71414.67 usec<br>122: 8388608 bytes 3 times --> 896.18 Mbps in 71413.99 usec<br>123: 8388611 bytes 3 times --> 896.14 Mbps in 71417.49 usec<br>
<br><br>I'll run it on each node and let you know if anything is out of place. I believe the above results are fine for GigE, yes?<br><br>- Dave<br><br><div class="gmail_quote">On Wed, Jul 1, 2009 at 4:20 PM, Sam Lang <span dir="ltr"><<a href="mailto:slang@mcs.anl.gov">slang@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div style=""><div><br></div>David,<div><br></div><div>It sounds like your initial thought (that there is a network problem) could be correct. I would probably explore that first. What sort of numbers do you get from netpipe runs (or even bmi_pingpong) between client and server?</div>
<div><br></div><font color="#888888"><div>-sam</div></font><div><div></div><div class="h5"><div><br><div><div>On Jul 1, 2009, at 5:15 PM, David Bonnie wrote:</div><br><blockquote type="cite">Sorry for not being clear.<br>
<br>The hardware and software is unchanged. Runs from a few months ago (on 2.8.0) performed as expected. Current runs (on both 2.8.0 and 2.8.1) are slow.<br><br>The nodes are sitting there with very low CPU usage even when running the benchmark. I'm the only one running any jobs and there aren't any processes running (the system load is < .02 and the cpu usage is pretty much 0%).<br>
<br>The local disks haven't changed and are empty except for the pvfs2 storage space; performance is bad even when I put the PVFS2 file system storage onto a very fast (>300 MB/s local bandwidth) Atrato vlun connected over fiber channel.<br>
<br>My initial thought is that some hardware along the line died but I can't seem to pinpoint it. All of the network interfaces show 0 errors and 0 dropped packets.<br><br>- Dave<br><br><div class="gmail_quote">On Wed, Jul 1, 2009 at 4:10 PM, Rob Ross <span dir="ltr"><<a href="mailto:rross@mcs.anl.gov" target="_blank">rross@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hi David,<br> <br> I still don't get it: when was the performance good? Same software and hardware, just some time in the past? Or is there a software change?<br>
<br> The nodes aren't being used for anything else, there are no rogue processes, and the local file systems are otherwise empty?<br> <br> Thanks,<br><font color="#888888"> <br> Rob<br></font><div> <br> On Jul 1, 2009, at 5:05 PM, David Bonnie wrote:<br>
<br> </div><div><div></div><div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Rob -<br> <br> Performance is down across all PVFS2 installations. The benchmark simply creates files of a random size (between 1 and 25 MB) in a single folder on the mounted PVFS2 partition, 16 KB at a time. It's not anywhere near ideal, but it's the workload I'm working with.<br>
<br> Prior to this problem we were getting ~22 MB/s write throughput and we're down to about 2.5 MB/s for no apparent reason. Reads are down from about 55 MB/s to 30 MB/s. No hardware has changed and as far as I can tell no hardware has died either.<br>
<br> - Dave<br> <br> <br> On Wed, Jul 1, 2009 at 4:00 PM, Rob Ross <<a href="mailto:rross@mcs.anl.gov" target="_blank">rross@mcs.anl.gov</a>> wrote:<br> Do you mean that 2.8.0 is fast and 2.8.1 is slow? Can you describe the benchmark and how you are doing your measurements?<br>
<br> Rob<br> <br> <br> On Jul 1, 2009, at 4:43 PM, David Bonnie wrote:<br> <br> Hello all -<br> <br> I'm having trouble figuring out a problem with performance depredation on a simple 10 node cluster. Prior runs on the cluster (before this problem manifested itself) resulted in bandwidth and IOPS about 10 times higher on a small file creation workload. Each node is running as a metadata server and a data server.<br>
<br> The problem is persistent between versions and installations of PVFS2 2.8.0 and 2.8.1. Rebooting all of the nodes didn't improve anything. The network connections (simple GigE) showed no errors or dropped packets. Using different physical disks (both SAS and FC) didn't improve things. The kernel logs didn't show anything out of place nor did the pvfs2 server or client logs. It seems like a network issue but I can't seem to find anything wrong with any of the connections.<br>
<br> Has anyone seen this kind of problem before? I seem to remember something on the list before about performance suddenly dropping but I can't find the message now (of course). Any insight would be appreciated!<br>
<br> Thanks,<br> <br> - Dave<br> _______________________________________________<br> Pvfs2-developers mailing list<br> <a href="mailto:Pvfs2-developers@beowulf-underground.org" target="_blank">Pvfs2-developers@beowulf-underground.org</a><br>
<a href="http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers" target="_blank">http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers</a><br> <br> <br> </blockquote> <br> </div></div></blockquote>
</div><br> _______________________________________________<br>Pvfs2-developers mailing list<br><a href="mailto:Pvfs2-developers@beowulf-underground.org" target="_blank">Pvfs2-developers@beowulf-underground.org</a><br><a href="http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers" target="_blank">http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers</a><br>
</blockquote></div><br></div></div></div></div><br>_______________________________________________<br>
Pvfs2-developers mailing list<br>
<a href="mailto:Pvfs2-developers@beowulf-underground.org">Pvfs2-developers@beowulf-underground.org</a><br>
<a href="http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers" target="_blank">http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers</a><br>
<br></blockquote></div><br>