[Pvfs2-developers] Re: dtype I/O bugs
Sam Lang
slang at mcs.anl.gov
Thu Jun 15 12:54:05 EDT 2006
On Jun 14, 2006, at 5:05 PM, Avery Ching wrote:
> Certainly I was able to at least identify one bug I think. The small
> I/O path is being used based on whether the amount of data going to
> the
> I/O servers is below the max_unexep_payload. However, when request
> gets
> to the server, small-io.sm calls PINT_Process_request() once and then
> calls job_trove_bstream_write_list once. Then it returns. If the
> number of stream offset-length pairs generated is greater than
> SMALL_IO_MAX_REGIONS then the operation won't finish. This won't show
> up in the list I/O path since we break it up on 64 ol-pairs. It shows
> up on the datatype I/O path since it doesn't get broken up. You could
> probably trigger it in list I/O by just making SMALL_IO_MAX_REGIONS
> smaller.
>
> Suggestions for fixing:
>
> 1) (Preferred) Loop around the job_trove_bstream_write_list and
> job_trove_bstream_read_list calls to keep moving data until the entire
> datatype has been satisfied.
>
> 2) (Alternative) Make the offset-length pairs limit part of the
> requirement for small I/O.
>
Thanks for debugging this Avery. For now I went with option #2 since
its easier :-). If you find that small IO is a big improvement for
list io then we can change it to do option 1. Can you let me know if
this patch fixes the problem for you?
Thanks,
-sam
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smallio.patch
Type: application/octet-stream
Size: 2907 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20060615/b6db27cc/smallio.obj
-------------- next part --------------
> Avery
>
> On Tue, 2006-06-13 at 17:44 -0500, Avery Ching wrote:
>> I was able to repeat the bug on the 4 server 20 client setup you
>> had. I
>> also made it happen on 1 client and 2 servers. It seems to work
>> fine with
>> 1 server and 1 client or 1 server and 20 clients, therefore, this
>> probably
>> is a multi-server issue. I'll investigate further and let you
>> know the
>> progress. I hope it's not another one of those PINT_Process_req() of
>> flow type problems!
>>
>> Avery
>>
>> On Mon, 12 Jun 2006, Avery Ching wrote:
>>
>>> Yeah I have. I am not sure exactly what the problem is to be
>>> honest.
>>> Basically that error messsage is just reporting what it got from the
>>> PVFS_Sys_write() call. Therefore, it could be a lot of things.
>>> The odd
>>> thing though is that it seems to happen at random places. The
>>> test works
>>> fine for other sizes, just fails on certain ones. I'm wondering
>>> whether
>>> it's related to the flow or Pint_process_request() problems we've
>>> been seeing on the listserv. Oddly enough, when I did my IPDPS
>>> testing, I
>>> never ran into that issue for write, only for read (just sometimes -
>>> hence I had no read results =) ).
>>>
>>> Unfortunately, debugging the flow and PINT_process_req() areas is
>>> quite
>>> difficult. I'll try and look into it a bit though. At least see
>>> if I can
>>> repeat the bug.
>>>
>>> Avery
>>>
>>> suspect that the write call is not returning the correct amount
>>> of data
>>> processed.
>>>
>>> On Mon, 12 Jun 2006, Robert Latham wrote:
>>>
>>>> Hi Avery
>>>> I've got another hpio bug:
>>>>
>>>> with 4 servers, 20 clients, hpio ran for a long long time and then
>>>> died like this:
>>>> write | region_count | c-nc | datatype
>>>> ----------------time (seconds)--------------|-bandwidth (MB/
>>>> s)|---test type---
>>>> open | io | sync | close | total | IO | IOsyn |
>>>> region_count
>>>> 0.062 | 8.160 | 0.208 | 0.000 | 8.429 | 0.031 | 0.030 |
>>>> 2048
>>>> ADIOI_PVFS2_StridedDtypeIO: Warning - PVFS_sys_read/write
>>>> returned -1610612737 and completed -4611717612071138032 bytes.
>>>> ADIOI_PVFS2_StridedDtypeIO: Warning - PVFS_sys_read/write
>>>> returned -1610612737 and completed -4611717612071081488 bytes.
>>>>
>>>> Seen anything like this before?
>>>> ==rob
>>>>
>>>> --
>>>> Rob Latham
>>>> Mathematics and Computer Science Division A215 0178 EA2D B059
>>>> 8CDF
>>>> Argonne National Labs, IL USA B29D F333 664A 4280
>>>> 315B
>>>>
>>>
>
More information about the Pvfs2-developers
mailing list