[Pvfs2-developers] parallel state machine code

Walter B. Ligon III walt at clemson.edu
Tue Sep 5 11:33:10 EDT 2006


OK, I think I fixed the small-io problem and the mkdir problem.
That only leaves the mounting problem.  I've never attempted to build 
the kernel interface or mount the file system (being the old goat that I 
am) so that might take a bit.

I'll commit the changes I made and you can run them in the next nightly 
to see if anything new pops up.

Walt

Robert Latham wrote:
> On Tue, Aug 29, 2006 at 04:55:06PM -0400, Walter B. Ligon III wrote:
> 
>>So, I would appreciate some help running some tests on the branch, while 
>>I start documenting, and let me know when you think I should start 
>>merging it back with the trunk.  Or I'm open to whatever other 
>>suggestions ...
> 
> 
> OK, walt, we're getting close.  I committed a couple small fixes to
> get pvfs2-client-core building.  Here's what's not working so well
> right now:
> 
> - mounting pvfs2 fails with a timeout
> 
> - many MPI-IO workloads pass, but the noncontig test triggered a
>   segfault in small_io_cleanup, where it cleans up various fields in
>   the sm_p structure.  In particular, 'sm_p->msgarray = NULL' caused a
>   core dump, and when I look at that core file in gdb,
>   sm_p->msgarray_count is really high (135950228).  Looks like maybe
>   the sm_p wasn't properly allocated? I dunno, I'm just the messenger.
> 
> - pvfs2-cp dies with a segfault when using a very small blocksize (-b
>   128). here's where gdb says the fault lies:
> 
> ---------------
>   #0  0x0806d3d8 in small_io_completion_fn (user_args=0x80f0da8, 
>     resp_p=0xbfffb42c, index=0) at sys-small-io.sm:242
> 242                 fdata.server_nr = sm_p->u.io.datafile_index_array[index];
> (gdb) p sm_p->u.io                            
> $8 = {io_type = 135162104, file_req = 0x2, file_req_offset = 0, buffer = 0x0, 
>   mem_req = 0x0, io_resp_p = 0x50, flowproto_type = 17, encoding = 135206232, 
>   datafile_index_array = 0x0, datafile_count = 0, 
>   msgpair_completion_count = 81, flow_completion_count = 0, 
>   write_ack_completion_count = 0, contexts = 0x80f13d4, 
>   context_count = 135205832, total_cancellations_remaining = 0, 
>   retry_count = 135206064, stored_error_code = 3396, total_size = 9, 
>   dfile_size_array = 0x0, small_io = 0}
> ---------------
> 
> - test-zero-fill fails with a segfault in the same place as pvfs2-cp:
> 
> ---------------
> #0  0x08065149 in small_io_completion_fn (user_args=0x80e9940, 
>     resp_p=0xbfffb86c, index=0) at sys-small-io.sm:317
> 317         sm_p->u.io.dfile_size_array[index] = resp_p->u.small_io.bstream_size;
> ---------------
> 
> - pvfs2-mkdir (a test contributed by acxiom) fails with a seg fault:
> 
> ---------------
> #0  0x080b134e in PINT_smcb_op (smcb=0x0)
>     at /sandbox/robl/pvfs2-nightly/pvfs2-WALT3/src/common/misc/state-machine-fns.c:348
> 348         return smcb->op;
> ---------------
> 
> 
> So I think if you can take care of the small-io cases, that would be a
> good start, as it would knock out 3 of the 5 failures.  Once WALT3
> passes our nightlies, we can think about merging into HEAD.
> 
> ==rob
> 

-- 
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University


More information about the Pvfs2-developers mailing list