[Pvfs2-users] "Remote Endpoint is Closed" error starting pvfs2-server

Joshua Randall jrandall at well.ox.ac.uk
Sat Aug 14 11:50:06 EDT 2010


I am using pvfs-2.8.2 and it works using TCP, but I ideally want to  
run it using open-mx.  I have installed and configured open-mx-1.3.1  
and it is running on all three servers.

Does anyone actually have Open-MX working with PVFS2?  I have set the  
MX_IMM_ACK environment variable to 1 as directed in the FAQ, and all  
my connectivity tests with Open-MX seem to work just fine.

Below I have attached relevant output and configuration files.

Thanks for any help you can offer!

Josh.



The output of omx_info shows all three hosts are successfully  
communicating over ethernet.
> $ sudo /opt/open-mx/bin/omx_info
> Open-MX version 1.3.1
> build: jrandall at tommy:/usr/local/src/open-mx/open-mx-1.3.1 Fri Aug  
> 13 19:07:08 BST 2010
>
> Found 1 boards (32 max) supporting 32 endpoints each:
> tommy:0 (board #0 name eth3 addr 00:1b:21:4f:4b:e6)
>   managed by driver 'ixgbe'
>   attached to numa node 0
>
> Peer table is ready, mapper is 00:00:00:00:00:00
> ================================================
>  0) 00:1b:21:4f:4b:e6 tommy:0
>  1) 00:1b:21:4d:ba:92 renton:0
>  2) 00:1b:21:4f:4d:5a begbie:0


The output of omx_endpoint_info shows all 32 endpoints are available.
> $ sudo /opt/open-mx/bin/omx_endpoint_info
> tommy:0 (board #0 name eth3 addr 00:1b:21:4f:4b:e6)
> ==============================================
>  raw   open by pid 20653 (omxoed)
> 0 regular endpoints open (out of 32)
>

When I run pvfs2-server, with PVFS2_DEBUGMASK="all" I get a "Remote  
Endpoint is Closed" error and the server exits with code 255.
> $ sudo /usr/local/sbin/pvfs2-server /etc/pvfs2-fs.conf -d

> [S 08/14 16:40] PVFS2 Server on node tommy version 2.8.2 starting...
> [D 08/14 16:40] Logging all (mask 18446744073709551615)
> [D 08/14 16:40] PINT_encode_initialize
> [D 08/14 16:40] lebf_initialize
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] PINT_do_request_commit: commit node 0x7fff0a7e6e40
> [D 08/14 16:40] node stored at 0
> [D 08/14 16:40] clearing tree
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] PINT_do_request_commit: commit node 0x7fff0a7e6e40
> [D 08/14 16:40] node stored at 0
> [D 08/14 16:40] clearing tree
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] Passing mx://tommy:0:0 as BMI listen address.
> OMX: Emulating MX_DISABLE_SHMEM as OMX_DISABLE_SHARED
> OMX: Forcing shared comms to disabled
> OMX: Setting 4 bits of context id at offset 60 in matching
> [D 08/14 16:40] Server using shm key hint: 1937657271
> [D 08/14 16:40] [BMI CONTROL]: BMI_set_info: set_info: 0 option: 11
> [D 08/14 16:40] [BMI CONTROL]: BMI_set_info: set_info: 0 option: 12
> [D 08/14 16:40] dbpf_thread_initialize: initialized
> [D 08/14 16:40] dbpf_thread_function started
> [D 08/14 16:40] [SYNC_COALESCE]: dbpf_sync_context_init for context  
> 0 called
> OMX: Completing iconnect request: Remote Endpoint is Closed



My pvfs2-fs.conf file contains:
> <Defaults>
> 	UnexpectedRequests 50
> 	EventLogging all
> 	EnableTracing no
> 	LogStamp datetime
> 	BMIModules bmi_mx
> 	FlowModules flowproto_multiqueue
> 	PerfUpdateInterval 1000
> 	ServerJobBMITimeoutSecs 30
> 	ServerJobFlowTimeoutSecs 30
> 	ClientJobBMITimeoutSecs 300
> 	ClientJobFlowTimeoutSecs 300
> 	ClientRetryLimit 5
> 	ClientRetryDelayMilliSecs 2000
> 	PrecreateBatchSize 512
> 	PrecreateLowThreshold 256
>
> 	StorageSpace /raid/pvfs2-storage-space
> 	LogFile /var/log/pvfs2-server.log
> </Defaults>
>
> <Aliases>
> 	Alias begbie mx://begbie:0:0
> 	Alias renton mx://renton:0:0
> 	Alias tommy mx://tommy:0:0
> </Aliases>
>
> <Filesystem>
> 	Name pvfs2-fs
> 	ID 1937657241
> 	RootHandle 1048576
> 	FileStuffing yes
> 	<MetaHandleRanges>
> 		Range begbie 3-1537228672809129302
> 		Range renton 1537228672809129303-3074457345618258602
> 		Range tommy 3074457345618258603-4611686018427387902
> 	</MetaHandleRanges>
> 	<DataHandleRanges>
> 		Range begbie 4611686018427387903-6148914691236517202
> 		Range renton 6148914691236517203-7686143364045646502
> 		Range tommy 7686143364045646503-9223372036854775802
> 	</DataHandleRanges>
> 	<StorageHints>
> 		TroveSyncMeta yes
> 		TroveSyncData no
> 		TroveMethod alt-aio
> 	</StorageHints>
> </Filesystem>




More information about the Pvfs2-users mailing list