[Pvfs2-developers] parallel state machine code
Walter B. Ligon III
walt at clemson.edu
Tue Sep 5 11:33:10 EDT 2006
OK, I think I fixed the small-io problem and the mkdir problem.
That only leaves the mounting problem. I've never attempted to build
the kernel interface or mount the file system (being the old goat that I
am) so that might take a bit.
I'll commit the changes I made and you can run them in the next nightly
to see if anything new pops up.
Walt
Robert Latham wrote:
> On Tue, Aug 29, 2006 at 04:55:06PM -0400, Walter B. Ligon III wrote:
>
>>So, I would appreciate some help running some tests on the branch, while
>>I start documenting, and let me know when you think I should start
>>merging it back with the trunk. Or I'm open to whatever other
>>suggestions ...
>
>
> OK, walt, we're getting close. I committed a couple small fixes to
> get pvfs2-client-core building. Here's what's not working so well
> right now:
>
> - mounting pvfs2 fails with a timeout
>
> - many MPI-IO workloads pass, but the noncontig test triggered a
> segfault in small_io_cleanup, where it cleans up various fields in
> the sm_p structure. In particular, 'sm_p->msgarray = NULL' caused a
> core dump, and when I look at that core file in gdb,
> sm_p->msgarray_count is really high (135950228). Looks like maybe
> the sm_p wasn't properly allocated? I dunno, I'm just the messenger.
>
> - pvfs2-cp dies with a segfault when using a very small blocksize (-b
> 128). here's where gdb says the fault lies:
>
> ---------------
> #0 0x0806d3d8 in small_io_completion_fn (user_args=0x80f0da8,
> resp_p=0xbfffb42c, index=0) at sys-small-io.sm:242
> 242 fdata.server_nr = sm_p->u.io.datafile_index_array[index];
> (gdb) p sm_p->u.io
> $8 = {io_type = 135162104, file_req = 0x2, file_req_offset = 0, buffer = 0x0,
> mem_req = 0x0, io_resp_p = 0x50, flowproto_type = 17, encoding = 135206232,
> datafile_index_array = 0x0, datafile_count = 0,
> msgpair_completion_count = 81, flow_completion_count = 0,
> write_ack_completion_count = 0, contexts = 0x80f13d4,
> context_count = 135205832, total_cancellations_remaining = 0,
> retry_count = 135206064, stored_error_code = 3396, total_size = 9,
> dfile_size_array = 0x0, small_io = 0}
> ---------------
>
> - test-zero-fill fails with a segfault in the same place as pvfs2-cp:
>
> ---------------
> #0 0x08065149 in small_io_completion_fn (user_args=0x80e9940,
> resp_p=0xbfffb86c, index=0) at sys-small-io.sm:317
> 317 sm_p->u.io.dfile_size_array[index] = resp_p->u.small_io.bstream_size;
> ---------------
>
> - pvfs2-mkdir (a test contributed by acxiom) fails with a seg fault:
>
> ---------------
> #0 0x080b134e in PINT_smcb_op (smcb=0x0)
> at /sandbox/robl/pvfs2-nightly/pvfs2-WALT3/src/common/misc/state-machine-fns.c:348
> 348 return smcb->op;
> ---------------
>
>
> So I think if you can take care of the small-io cases, that would be a
> good start, as it would knock out 3 of the 5 failures. Once WALT3
> passes our nightlies, we can think about merging into HEAD.
>
> ==rob
>
--
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
More information about the Pvfs2-developers
mailing list