[PVFS2-developers] CVS pvfs2-server server daemons crashing
Robert Latham
robl at mcs.anl.gov
Wed Jun 1 19:00:35 EDT 2005
On Wed, May 04, 2005 at 01:52:56PM -0400, Walter B. Ligon III wrote:
> Something is very wrong here. To tell the truth, I'm not exactly sure
> why line 161 is in the code, because we should never get to that point.
Yes, i know it's taken me a month to follow up on this thread. I
honestly didn't know we had an opteron machine floating around I
could use :>
Anyway, Walt you're absolutely right. If you build pvfs2 w/o
optimization, the pvfs2-server segfault occurs in a slightly different
location:
#0 0x0000000000427337 in PINT_Process_request (req=0x613ed0, mem=0x0,
rfdata=0x5e2540, result=0x5fd9d8, mode=17)
at ../pvfs2-source/src/io/description/pint-request.c:215
#1 0x00000000004233b3 in trove_write_callback_fn (user_ptr=0x5fd9c8,
error_code=0)
at ../pvfs2-source/src/io/flow/flowproto-bmi-trove/flowproto-multiqueue.c:1271
#2 0x0000000000421c67 in fp_multiqueue_post (flow_d=0x5e24d0)
at ../pvfs2-source/src/io/flow/flowproto-bmi-trove/flowproto-multiqueue.c:619
#3 0x000000000041f8a9 in PINT_flow_post (flow_d=0x5e24d0)
at ../pvfs2-source/src/io/flow/flow.c:405
#4 0x00000000004196c1 in job_flow (flow_d=0x5e24d0, user_ptr=0x5eb8a0,
status_user_tag=0, out_status_p=0x5a4df0, id=0x7fbffff058, context_id=0,
timeout_sec=30) at ../pvfs2-source/src/io/job/job.c:1177
#5 0x000000000044f1e4 in io_start_flow (s_op=0x5eb8a0, js_p=0x5a4df0)
at io.sm:241
#6 0x000000000040ea8d in PINT_state_machine_next (s=0x5eb8a0, r=0x5a4df0)
at state-machine-fns.h:140
#7 0x000000000040c1b2 in main (argc=4, argv=0x7fbffff1a8)
at ../pvfs2-source/src/server/pvfs2-server.c:360
(This seems like the perfect time to bust out some valgrind, but
unfortunatly the port to amd64 does not understand yet how to deal
with the pwrite64 syscall.)
A couple odd things turn up in GDB:
(gdb) p req->cur[req->lvl].rq->ereq
$3 = (struct PINT_Request *) 0xe70006437f8
(gdb) p * $3
Cannot access memory at address 0xe70006437f8
(gdb) p req->cur[req->lvl].rq->ereq->num_contig_chunks
Cannot access memory at address 0xe7000643828
linking with efence didn't help narrow down the problem any.
(gdb) p *req
$2 = {cur = 0x2a9c0f1fa0, lvl = 0, bytes = 0, type_offset = 0,
target_offset = 4194304, final_offset = 8388608, eof_flag = 0 '\0'}
I turned on 'request' logging. The last thing reported before the
segfault was this:
PINT_New_request_state
PINT_Process_request
tiling 3 copies
skipping ahead to target_offset
Do seq of 0 ne 4194304 st 0 nb 1 ub 4194304 lb 0 as 4194304 co 0
lvl 0 el 0 blk 0 by 0
to 0 ta 4194304 fi 8388608
Any of this make any more sense than last time?
==rob
--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Labs, IL USA B29D F333 664A 4280 315B
More information about the PVFS2-developers
mailing list