[Pvfs2-users] "Remote Endpoint is Closed" error starting pvfs2-server

Scott Atchley atchley at myri.com
Sat Aug 28 16:00:33 EDT 2010


Josh,

Nice digging.

Let me ping the Open-MX developer and see if the Open-MX session ids are maintained the same as MX does. We made some changes in the last release or two.

Scott


On Aug 28, 2010, at 3:01 PM, jrandall at well.ox.ac.uk wrote:

> FWIW, I've run the server in gdb to see what is causing the seg fault when
> a client tries to connect remotely:
> 
>> [D 08/28 19:40] bmi_mx: CONN_REQ from mx://begbie:0:0.
>> [D 08/28 19:40] bmi_mx: bmx_unexpected_recv rx match= 0xc000000100000100
> length= 16.
>> [D 08/28 19:40] bmi_mx: bmx_handle_conn_req returned RX match
> 0xc000000100000100 with Success.
>> 
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7ffff06c0910 (LWP 3026)]
>> bmx_handle_conn_req () at src/io/bmi/bmi_mx/mx.c:2403
>> 2403                            } else if (sid != peer->mxp_sid) { /*
> reconnecting peer */
>> (gdb) bt
>> #0  bmx_handle_conn_req () at src/io/bmi/bmi_mx/mx.c:2403
>> #1  bmx_connection_handlers () at src/io/bmi/bmi_mx/mx.c:2561
>> #2  0x0000000000476102 in BMI_mx_testunexpected (incount=196610,
> outcount=0xb8f31b2a007fc118, ui=0x7ffff06bfe38,
> max_idle_time=-261357980)
>>    at src/io/bmi/bmi_mx/mx.c:2820
>> #3  0x00000000004549b2 in BMI_testunexpected (incount=<value optimised
> out>, outcount=<value optimised out>, info_array=<value optimised out>,
> max_idle_time_ms=0)
>>    at src/io/bmi/bmi.c:1000
>> #4  0x000000000044d5c0 in bmi_thread_function (ptr=<value optimised
> out>) > at src/io/job/thread-mgr.c:182
>> #5  0x00007ffff7292a04 in start_thread () from /lib/libpthread.so.0
>> #6  0x00007ffff6bcdd4d in clone () from /lib/libc.so.6
>> #7  0x0000000000000000 in ?? ()
> 
> 
> Line 2403 of mx.c is:
> 
>> } else if (sid != peer->mxp_sid) { /* reconnecting peer */
> 
> So I examined those variables in gdb:
> 
>> (gdb) print sid
>> $1 = 3102939946
>> (gdb) print peer
>> $2 = (struct bmx_peer *) 0x5fb7569100030002
>> (gdb) print peer->mxp_sid
>> Cannot access memory at address 0x5fb756910003002a
> 
> I looked back to where peer was getting set, to lines 2358 to 2362:
> 
>>                        {
>>                                void *peerp = &peer;
>>                                mx_get_endpoint_addr_context(status.source,
> &peerp);
>>                                peer = (struct bmx_peer *) peerp;
>>                        }
> 
> I checked status source, which seems to be ok:
>> (gdb) print status.source
>> $3 = {stuff = {8415496, 13327025589530378520}}
> 
> I set a breakpoint at line 2361 and ran it again:
> 
>> Breakpoint 1, bmx_handle_conn_req () at src/io/bmi/bmi_mx/mx.c:2361
>> 2361                                    peer = (struct bmx_peer *) peerp;
>> (gdb) print peerp
>> $2 = (void *) 0xcfa4b15e00030001
>> (gdb) step
>> 2363                            if (peer == NULL) { /* new peer */
>> (gdb) print peer
>> $3 = (struct bmx_peer *) 0x0
>> (gdb) print peer->mxp_sid
>> Cannot access memory at address 0x28
> 
> It seems that mx_get_endpoint_addr_context() is potentially not returning
> the expected structure?
> 
> Josh.
> 
> 
> 




More information about the Pvfs2-users mailing list