[Pvfs2-developers] libpvfs2 usage
Sam Lang
slang at mcs.anl.gov
Wed Oct 18 17:35:12 EDT 2006
On Oct 18, 2006, at 2:18 PM, Sam Lang wrote:
>
> On Oct 16, 2006, at 5:40 PM, Brett Bode wrote:
>
>> Hello,
>> We have modified an existing application to directly call
>> libpvfs2. Our pvfs2 setup has 6 servers and is setup to run pvfs2
>> over OpenIB verbs. We borrowed the code more or less from pvfs2-
>> cp. This seems to work and we have had several successful runs.
>> However we have also had a couple of hangs on one node. The
>> traceback for the hang is:
>>
>> #0 0x00002ab9874a34bf in poll () from /lib/libc.so.6
>> #1 0x0000000001cbea67 in BMI_ib_testcontext ()
>> #2 0x0000000001c8feb4 in BMI_testcontext ()
>> #3 0x0000000001c99624 in PINT_thread_mgr_bmi_push ()
>> #4 0x0000000001c950d3 in do_one_work_cycle_all ()
>> #5 0x0000000001c95883 in job_testcontext ()
>> #6 0x0000000001ca37e4 in PINT_client_state_machine_test ()
>> #7 0x0000000001ca3c00 in PINT_client_wait_internal ()
>> #8 0x0000000001c7df71 in PVFS_sys_io ()
>> #9 0x0000000001c6e253 in flushBuffer ()
>> at /afs/.scl.ameslab.gov/project/nodeimg/amd64.test/usr/src/
>> gamess-pvfs/bypa
>> ssIO-pvfs.c:355
>> #10 0x0000000005eb27b0 in userFilePos ()
>>
>> Eventually we timeout and die. So the first question is do you
>> have any suggestions as to where to look for the cause of the
>> hang? That is a write, but I have seen it fail now during a read
>> as well (it died on the 12th pass through after reading the
>> complete file 11 times).
>>
>> We also have several usage and/or tuning related questions. First
>> off, when the file is created there are options for the
>> "dfile_count" and the "strip_size". Thus far I have left them at
>> defaults. Can you comment on what sort of values would be optimal
>> for sequentially accessed large files. Would tuning the IO buffer
>> size the application passes to the strip size be useful?
>
> You're already seeing that matching the stripe size and request
> size give you much fewer cache misses, which is new info we can add
> to the tuning guide, or maybe Pete can come up with some
> optimizations around that.
Rob pointed out that its not matching the two that you really want.
The ideal strip size needs to be large enough to prevent a single
request from being many multiples of the stripe size, but small
enough that a request still spans all servers. So for your specific
case, ideally you would have:
strip_size = request_size / number_of_servers
-sam
> Usually the strip size is used to control the behavior of disk
> IO, as it means the trove layer is able to do reads and writes in
> larger chunks. I think we've generally found for that larger
> workloads the default strip size is ideally matched to the size of
> requests. I think just increasing the strip size shouldn't
> necessarily help for sequential accesses.
>
> Up to this point, the dfile_count has only been used to improve
> performance of IO on smaller files, by setting the value to 1, so
> that small requests are not broken down even further. In your case
> it probably makes sense to leave it at its default value.
>
> What 'tuning guide' you say? Its currently a work in
> progress :-). If anyone is interested in helping out, especially
> for the IB sections, we could really use it.
>
>>
>> We have also have a problem when running on our IBM EHCA's with
>> too many memory registrations. The odd part is that I am using the
>> same 1MB buffer all time so I don't see why it seems to be
>> reregistered at each write. My write code looks like this:
>>
>> file_req = PVFS_BYTE;
>> ret = PVFS_Request_contiguous(ioSize, PVFS_BYTE,
>> &mem_req);
>> if (ret < 0) {
>> PVFS_perror("PVFS_Request_contiguous", ret);
>> return;
>> }
>> ret = PVFS_sys_write(target_object.ref, file_req,
>> bufferedFilePos, myBuffer, mem_req,
>> &credentials, &resp_io);
>> if (ret == 0) {
>> PVFS_Request_free(&mem_req);
>> /* return(resp_io.total_completed);*/
>> } else
>> PVFS_perror("PVFS_sys_write", ret);
>>
>> One question is what does PVFS_Request_contiguous actually do?
>
> It creates a request structure that essentially contains the size
> and offset into the memory buffer.
>
>> Since I am using the same buffer all the time would it be ok to
>> setup the request once and then reuse it so long as the io size is
>> the same?
>
> Yes. The request structure doesn't get modified by the IO call.
> You (correctly) use PVFS_BYTE for the file request. The reason you
> can't just use PVFS_BYTE for the memory request is that the size
> has to be encapsulated in the request as well (while the file
> request gets tiled based on the actual file size).
>
> -sam
>
>>
>> Thanks for any help you can provide,
>>
>> Brett
>>
>>
>> ____________________________________________
>> Dr. Brett Bode
>> 329 Wilhelm Hall
>> Ames Laboratory
>> Iowa State University
>> Ames, IA 50011 (515) 294-9192
>> brett at scl.ameslab.gov FAX: (515) 294-4491
>> ____________________________________________
>>
>>
>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
More information about the Pvfs2-developers
mailing list