[Pvfs2-developers] Re: dtype I/O bugs
Avery Ching
aching at ece.northwestern.edu
Wed Jun 14 18:05:49 EDT 2006
Certainly I was able to at least identify one bug I think. The small
I/O path is being used based on whether the amount of data going to the
I/O servers is below the max_unexep_payload. However, when request gets
to the server, small-io.sm calls PINT_Process_request() once and then
calls job_trove_bstream_write_list once. Then it returns. If the
number of stream offset-length pairs generated is greater than
SMALL_IO_MAX_REGIONS then the operation won't finish. This won't show
up in the list I/O path since we break it up on 64 ol-pairs. It shows
up on the datatype I/O path since it doesn't get broken up. You could
probably trigger it in list I/O by just making SMALL_IO_MAX_REGIONS
smaller.
Suggestions for fixing:
1) (Preferred) Loop around the job_trove_bstream_write_list and
job_trove_bstream_read_list calls to keep moving data until the entire
datatype has been satisfied.
2) (Alternative) Make the offset-length pairs limit part of the
requirement for small I/O.
Avery
On Tue, 2006-06-13 at 17:44 -0500, Avery Ching wrote:
> I was able to repeat the bug on the 4 server 20 client setup you had. I
> also made it happen on 1 client and 2 servers. It seems to work fine with
> 1 server and 1 client or 1 server and 20 clients, therefore, this probably
> is a multi-server issue. I'll investigate further and let you know the
> progress. I hope it's not another one of those PINT_Process_req() of
> flow type problems!
>
> Avery
>
> On Mon, 12 Jun 2006, Avery Ching wrote:
>
> > Yeah I have. I am not sure exactly what the problem is to be honest.
> > Basically that error messsage is just reporting what it got from the
> > PVFS_Sys_write() call. Therefore, it could be a lot of things. The odd
> > thing though is that it seems to happen at random places. The test works
> > fine for other sizes, just fails on certain ones. I'm wondering whether
> > it's related to the flow or Pint_process_request() problems we've
> > been seeing on the listserv. Oddly enough, when I did my IPDPS testing, I
> > never ran into that issue for write, only for read (just sometimes -
> > hence I had no read results =) ).
> >
> > Unfortunately, debugging the flow and PINT_process_req() areas is quite
> > difficult. I'll try and look into it a bit though. At least see if I can
> > repeat the bug.
> >
> > Avery
> >
> > suspect that the write call is not returning the correct amount of data
> > processed.
> >
> > On Mon, 12 Jun 2006, Robert Latham wrote:
> >
> > > Hi Avery
> > > I've got another hpio bug:
> > >
> > > with 4 servers, 20 clients, hpio ran for a long long time and then
> > > died like this:
> > > write | region_count | c-nc | datatype
> > > ----------------time (seconds)--------------|-bandwidth (MB/s)|---test type---
> > > open | io | sync | close | total | IO | IOsyn | region_count
> > > 0.062 | 8.160 | 0.208 | 0.000 | 8.429 | 0.031 | 0.030 | 2048
> > > ADIOI_PVFS2_StridedDtypeIO: Warning - PVFS_sys_read/write returned -1610612737 and completed -4611717612071138032 bytes.
> > > ADIOI_PVFS2_StridedDtypeIO: Warning - PVFS_sys_read/write returned -1610612737 and completed -4611717612071081488 bytes.
> > >
> > > Seen anything like this before?
> > > ==rob
> > >
> > > --
> > > Rob Latham
> > > Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
> > > Argonne National Labs, IL USA B29D F333 664A 4280 315B
> > >
> >
More information about the Pvfs2-developers
mailing list