[Pvfs2-users] PVFS v2.8.0 compile problems
Phil Carns
carns at mcs.anl.gov
Fri Mar 20 09:01:49 EST 2009
Hi Tony,
This is most likely due to a change in how PVFS 2.8.x tracks file size
during writes beyond EOF. It now stores file size explicitly in
berkeley db for each data file. This is required for the new directio
storage method, but we applied it to the other methods as well to
simplify compatibility.
A test that you could run to confirm this would be to change your PVFS
server configuration file to this in the StorageHints section:
TroveSyncMeta no
With that set to "no", PVFS will still synchronize metadata (including
the explicit size field), but it may delay synchronization until after
an acknowledgement has been sent to the client. This will probably hide
the size update cost for a serial application.
The size update overhead will only show up for serialized applications
that issue small writes beyond EOF (like iozone in the "initial write"
phase). If it were a parallel application, PVFS would coalesce the size
updates to reduce overhead. If it were a serial application that used
larger writes, the size update cost would be amortized over a longer
period of time.
Regarding your log file warnings, those are normal. In 2.8.x the
servers communicate with each other on startup to precreate datafile
objects. It issues those warnings on occasion if one or more servers is
not up and running yet when it tries to do that, but it will stop as
soon as all servers are available.
thanks,
-Phil
Tony Kew wrote:
> Dear Phil,
>
> Irrespective of the --enable-mmap-racache option, there does seem to be
> a marked performance drop between PVFS version 2.7.1 (with the 20 or
> so patches along the way - not that any were for performance insofar as
> I am aware) and version 2.8.x
>
> version 2.8.1 built was with the following configure options:
> ./configure --prefix=/usr \
> --libdir=/usr/lib64 \
> --enable-perf-counters \
> --enable-fast \
> --with-kernel=%{_kernelsrcdir} \
> --enable-shared \
>
> version 2.7.1 (fully patched) configured as above, with the addition
> of the --enable-mmap-racache option.
>
> I ran three iozone tests for each of the tested distributions, using
> a PBS batch job that creates a (new) filesystem across all the nodes
> in the job. The iozone job runs a parallel iozone job, with one
> data stream on each node. The test directory is configured as
> a stripe across all the nodes, so each node is writing to all the other
> nodes during the test.
>
> The average I/O numbers from running three iozone runs,
> writing to a directory configured using the "Simple Stripe" ditribution:
>
> v2.7.1:
>
> Initial write: 219,306.19 KB/sec
> Rewrite: 130,799.13 KB/sec
> Read: 183,249.66 KB/sec
> Re-read: 191,565.02 KB/sec
>
>
> v2.8.1:
>
> Initial write: 40,381.42 KB/sec
> Rewrite: 132,908.15 KB/sec
> Read: 203,758.06 KB/sec
> Re-read: 276,100.11 KB/sec
>
> For a TwoD Stipe distribution:
>
> v2.7.1:
>
> Initial write: 343,876.68 KB/sec
> Rewrite: 229,740.04 KB/sec
> Read: 167,045.91 KB/sec
> Re-read: 166,417.03 KB/sec
>
> v2.8.1:
>
> Initial write: 140,253.67 KB/sec
> Rewrite: 201,923.75 KB/sec
> Read: 182,109.70 KB/sec
> Re-read: 205,073.70 KB/sec
>
> KB/sec
>
>
>
> In the server log files for the v2.8.1 runs there are many of these:
>
> [E 03/06 16:26] Warning: msgpair failed to tcp://c14n24:3334, will
> retry: Connection refused
>
> ...but only at the time when the filesystem is created, so I don't
> believe they have any bearing on the test results.
>
> Let me know if I can provide any more info, or if further tests
> would be of use....
>
>
> Many Thanks,
> Tony
>
> Tony Kew
> SAN Administrator
> The Center for Computational Research
> New York State Center of Excellence
> in Bioinformatics & Life Sciences
> 701 Ellicott Street, Buffalo, NY 14203
>
> CoE Office: (716) 881-8930 Fax: (716) 849-6656
> CSE Office: (716) 645-3797 x2174
> Cell: (716) 560-0910
>
> "I love deadlines, I love the whooshing noise they make as they go by."
> Douglas Adams
>
>
>
> Tony Kew wrote:
>> Dear Phil,
>>
>> PVFS 2.8.1 works (insofar as I can tell) with the --enable-mmap-racache
>> configure option with the following patch:
>>
>> --- src/apps/kernel/linux/pvfs2-client-core.c.orig 2009-02-27
>> 15:53:50.000000000 -0500
>> +++ src/apps/kernel/linux/pvfs2-client-core.c 2009-02-27
>> 15:54:22.000000000 -0500
>> @@ -1609,7 +1609,7 @@ static PVFS_error post_io_readahead_requ
>> &vfs_request->in_upcall.credentials,
>> &vfs_request->response.io,
>> vfs_request->in_upcall.req.io.io_type,
>> - &vfs_request->op_id, (void *)vfs_request);
>> + &vfs_request->op_id, vfs_request->hints, (void *)vfs_request);
>>
>> if (ret < 0)
>> {
>>
>>
>> I am, though, getting poor iozone performance numbers for initial
>> write...
>> I'm going to run some more iozone tests & try without the
>> --enable-mmap-racache and let you know what I find...
>>
>>
>> Thanks,
>> Tony
>>
>>
>> Tony Kew
>> SAN Administrator
>> The Center for Computational Research
>> New York State Center of Excellence
>> in Bioinformatics & Life Sciences
>> 701 Ellicott Street, Buffalo, NY 14203
>>
>> CoE Office: (716) 881-8930 Fax: (716) 849-6656
>> CSE Office: (716) 645-3797 x2174
>> Cell: (716) 560-0910
>>
>> "I love deadlines, I love the whooshing noise they make as they go by."
>> Douglas Adams
>>
>>
>>
>> Tony Kew wrote:
>>> Dear Phil,
>>>
>>> Thanks for the info, the option worked for me with 2.7.1 codebase, for
>>> what its worth. The 2.8.0 code with my patch works so far (with the
>>> very
>>> limited tests I have done.) I'll be testing over several nodes this
>>> afternoon,
>>> or tomorrow.
>>>
>>> [...]
>>>
>>>
>>> Phil Carns wrote:
>>>> Hi Tony,
>>>>
>>>> I just wanted to mention that the second compile problem that you
>>>> pointed out is from a code path that gets enabled with the
>>>> --enable-mmap-racache option. That particular option is
>>>> experimental and (as you have found) not well tested. I would not
>>>> advise using it in a production setting.
>>>>
>>>> -Phil
>>
More information about the Pvfs2-users
mailing list