[Pvfs2-users] "Remote Endpoint is Closed" error
starting pvfs2-server
Scott Atchley
atchley at myri.com
Sat Aug 28 16:00:33 EDT 2010
Josh,
Nice digging.
Let me ping the Open-MX developer and see if the Open-MX session ids are maintained the same as MX does. We made some changes in the last release or two.
Scott
On Aug 28, 2010, at 3:01 PM, jrandall at well.ox.ac.uk wrote:
> FWIW, I've run the server in gdb to see what is causing the seg fault when
> a client tries to connect remotely:
>
>> [D 08/28 19:40] bmi_mx: CONN_REQ from mx://begbie:0:0.
>> [D 08/28 19:40] bmi_mx: bmx_unexpected_recv rx match= 0xc000000100000100
> length= 16.
>> [D 08/28 19:40] bmi_mx: bmx_handle_conn_req returned RX match
> 0xc000000100000100 with Success.
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7ffff06c0910 (LWP 3026)]
>> bmx_handle_conn_req () at src/io/bmi/bmi_mx/mx.c:2403
>> 2403 } else if (sid != peer->mxp_sid) { /*
> reconnecting peer */
>> (gdb) bt
>> #0 bmx_handle_conn_req () at src/io/bmi/bmi_mx/mx.c:2403
>> #1 bmx_connection_handlers () at src/io/bmi/bmi_mx/mx.c:2561
>> #2 0x0000000000476102 in BMI_mx_testunexpected (incount=196610,
> outcount=0xb8f31b2a007fc118, ui=0x7ffff06bfe38,
> max_idle_time=-261357980)
>> at src/io/bmi/bmi_mx/mx.c:2820
>> #3 0x00000000004549b2 in BMI_testunexpected (incount=<value optimised
> out>, outcount=<value optimised out>, info_array=<value optimised out>,
> max_idle_time_ms=0)
>> at src/io/bmi/bmi.c:1000
>> #4 0x000000000044d5c0 in bmi_thread_function (ptr=<value optimised
> out>) > at src/io/job/thread-mgr.c:182
>> #5 0x00007ffff7292a04 in start_thread () from /lib/libpthread.so.0
>> #6 0x00007ffff6bcdd4d in clone () from /lib/libc.so.6
>> #7 0x0000000000000000 in ?? ()
>
>
> Line 2403 of mx.c is:
>
>> } else if (sid != peer->mxp_sid) { /* reconnecting peer */
>
> So I examined those variables in gdb:
>
>> (gdb) print sid
>> $1 = 3102939946
>> (gdb) print peer
>> $2 = (struct bmx_peer *) 0x5fb7569100030002
>> (gdb) print peer->mxp_sid
>> Cannot access memory at address 0x5fb756910003002a
>
> I looked back to where peer was getting set, to lines 2358 to 2362:
>
>> {
>> void *peerp = &peer;
>> mx_get_endpoint_addr_context(status.source,
> &peerp);
>> peer = (struct bmx_peer *) peerp;
>> }
>
> I checked status source, which seems to be ok:
>> (gdb) print status.source
>> $3 = {stuff = {8415496, 13327025589530378520}}
>
> I set a breakpoint at line 2361 and ran it again:
>
>> Breakpoint 1, bmx_handle_conn_req () at src/io/bmi/bmi_mx/mx.c:2361
>> 2361 peer = (struct bmx_peer *) peerp;
>> (gdb) print peerp
>> $2 = (void *) 0xcfa4b15e00030001
>> (gdb) step
>> 2363 if (peer == NULL) { /* new peer */
>> (gdb) print peer
>> $3 = (struct bmx_peer *) 0x0
>> (gdb) print peer->mxp_sid
>> Cannot access memory at address 0x28
>
> It seems that mx_get_endpoint_addr_context() is potentially not returning
> the expected structure?
>
> Josh.
>
>
>
More information about the Pvfs2-users
mailing list