[PVFS2-developers] CVS pvfs2-server server daemons crashing

Robert Latham robl at mcs.anl.gov
Thu Jun 2 13:51:54 EDT 2005


On Thu, Jun 02, 2005 at 12:43:07PM -0400, Walter B. Ligon III wrote:
> --------
> 
> Yes, this makes sense.  It looks like the ereq pointer got hosed, but
> the rq pointer is OK.  They should be numerically very close to one
> another.  If not, then the failure happened in decoding the request,
> when the offsets are converted back to pointers.

> Use gdb to print the value of req->cur[req-lvl].rq and also what it
> points to.  

(gdb) p req->cur[req->lvl].rq->ereq
$1 = (struct PINT_Request *) 0xe7000739518
(gdb) p req->cur[req->lvl].rq
$2 = (PINT_Request *) 0x7394c0
(gdb) p *$2
$3 = {offset = 0, num_ereqs = 4194304, num_blocks = 1, stride = 0, 
  ub = 4194304, lb = 0, aggregate_size = 4194304, num_contig_chunks = 1, 
  depth = 2, num_nested_req = 1, committed = -1, refcount = 1, 
  ereq = 0xe7000739518, sreq = 0x0}

> I can only assume the values printed for the contents
> of that record in the debug listing are correct.  The difference
> between rq and rq->ereq should be a multiple of the size of the
> request record which should be 68 bytes.  Unless this is a really
> complex type the multiple should be 1 or 2 or something small.

well, since we're working through the VFS, we only ever have to deal
with contiguous requests.  

> If this turns out to be the problem, then we need to trace the
> conversion process in gdb and see where things are going wrong.  Won't
> surprise me if it is another signed/unsigned number conversion error
> thing.  Those are tricky to get right.

Hm. that might be tough. we have to run bonnie for a minute or two
before we hit this state.  I'll run bonnie in gdb and try to see if
there's a specific write call that causes problems.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Labs, IL USA                B29D F333 664A 4280 315B


More information about the PVFS2-developers mailing list