[Pvfs2-developers] PVFS2 Performance Problem -
Sam Lang
slang at mcs.anl.gov
Wed Jul 1 19:04:06 EDT 2009
David,
I hate to question what you've said, but are you sure that you were
getting good performance with 2.8.0? Is it possible that you only got
good performance with 2.7.1, and that switching to 2.8.0 (and 2.8.1)
has caused this performance degradation? I ask because (as Rob has
hinted at) we changed the way we manage the side of datafiles in
releases >=2.8.0, and we've seen performance drops for serial, small
file workloads. Its a bug, and we've fixed it in CVS, but you may be
seeing it in your setup.
-sam
On Jul 1, 2009, at 5:57 PM, David Bonnie wrote:
> Sam -
>
> All of the nodes checked out fine with netpipe, still no errors on
> any of the adapters.
>
> - Dave
>
> On Wed, Jul 1, 2009 at 4:47 PM, Sam Lang <slang at mcs.anl.gov> wrote:
>
> On Jul 1, 2009, at 5:45 PM, David Bonnie wrote:
>
>> I'll run it on each node and let you know if anything is out of
>> place. I believe the above results are fine for GigE, yes?
>
> They certainly don't match with the numbers you're getting from PVFS.
> -sam
>
>>
>> - Dave
>>
>> On Wed, Jul 1, 2009 at 4:20 PM, Sam Lang <slang at mcs.anl.gov> wrote:
>>
>> David,
>>
>> It sounds like your initial thought (that there is a network
>> problem) could be correct. I would probably explore that first.
>> What sort of numbers do you get from netpipe runs (or even
>> bmi_pingpong) between client and server?
>>
>> -sam
>>
>> On Jul 1, 2009, at 5:15 PM, David Bonnie wrote:
>>
>>> Sorry for not being clear.
>>>
>>> The hardware and software is unchanged. Runs from a few months
>>> ago (on 2.8.0) performed as expected. Current runs (on both 2.8.0
>>> and 2.8.1) are slow.
>>>
>>> The nodes are sitting there with very low CPU usage even when
>>> running the benchmark. I'm the only one running any jobs and
>>> there aren't any processes running (the system load is < .02 and
>>> the cpu usage is pretty much 0%).
>>>
>>> The local disks haven't changed and are empty except for the pvfs2
>>> storage space; performance is bad even when I put the PVFS2 file
>>> system storage onto a very fast (>300 MB/s local bandwidth) Atrato
>>> vlun connected over fiber channel.
>>>
>>> My initial thought is that some hardware along the line died but I
>>> can't seem to pinpoint it. All of the network interfaces show 0
>>> errors and 0 dropped packets.
>>>
>>> - Dave
>>>
>>> On Wed, Jul 1, 2009 at 4:10 PM, Rob Ross <rross at mcs.anl.gov> wrote:
>>> Hi David,
>>>
>>> I still don't get it: when was the performance good? Same software
>>> and hardware, just some time in the past? Or is there a software
>>> change?
>>>
>>> The nodes aren't being used for anything else, there are no rogue
>>> processes, and the local file systems are otherwise empty?
>>>
>>> Thanks,
>>>
>>> Rob
>>>
>>> On Jul 1, 2009, at 5:05 PM, David Bonnie wrote:
>>>
>>> Rob -
>>>
>>> Performance is down across all PVFS2 installations. The benchmark
>>> simply creates files of a random size (between 1 and 25 MB) in a
>>> single folder on the mounted PVFS2 partition, 16 KB at a time.
>>> It's not anywhere near ideal, but it's the workload I'm working
>>> with.
>>>
>>> Prior to this problem we were getting ~22 MB/s write throughput
>>> and we're down to about 2.5 MB/s for no apparent reason. Reads
>>> are down from about 55 MB/s to 30 MB/s. No hardware has changed
>>> and as far as I can tell no hardware has died either.
>>>
>>> - Dave
>>>
>>>
>>> On Wed, Jul 1, 2009 at 4:00 PM, Rob Ross <rross at mcs.anl.gov> wrote:
>>> Do you mean that 2.8.0 is fast and 2.8.1 is slow? Can you describe
>>> the benchmark and how you are doing your measurements?
>>>
>>> Rob
>>>
>>>
>>> On Jul 1, 2009, at 4:43 PM, David Bonnie wrote:
>>>
>>> Hello all -
>>>
>>> I'm having trouble figuring out a problem with performance
>>> depredation on a simple 10 node cluster. Prior runs on the
>>> cluster (before this problem manifested itself) resulted in
>>> bandwidth and IOPS about 10 times higher on a small file creation
>>> workload. Each node is running as a metadata server and a data
>>> server.
>>>
>>> The problem is persistent between versions and installations of
>>> PVFS2 2.8.0 and 2.8.1. Rebooting all of the nodes didn't improve
>>> anything. The network connections (simple GigE) showed no errors
>>> or dropped packets. Using different physical disks (both SAS and
>>> FC) didn't improve things. The kernel logs didn't show anything
>>> out of place nor did the pvfs2 server or client logs. It seems
>>> like a network issue but I can't seem to find anything wrong with
>>> any of the connections.
>>>
>>> Has anyone seen this kind of problem before? I seem to remember
>>> something on the list before about performance suddenly dropping
>>> but I can't find the message now (of course). Any insight would
>>> be appreciated!
>>>
>>> Thanks,
>>>
>>> - Dave
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> Pvfs2-developers at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> Pvfs2-developers at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>>
>
>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20090701/b146dd6c/attachment-0001.htm
More information about the Pvfs2-developers
mailing list