[Pvfs2-users] "Remote Endpoint is Closed" error starting
pvfs2-server
Joshua Randall
jrandall at well.ox.ac.uk
Sat Aug 14 11:50:06 EDT 2010
I am using pvfs-2.8.2 and it works using TCP, but I ideally want to
run it using open-mx. I have installed and configured open-mx-1.3.1
and it is running on all three servers.
Does anyone actually have Open-MX working with PVFS2? I have set the
MX_IMM_ACK environment variable to 1 as directed in the FAQ, and all
my connectivity tests with Open-MX seem to work just fine.
Below I have attached relevant output and configuration files.
Thanks for any help you can offer!
Josh.
The output of omx_info shows all three hosts are successfully
communicating over ethernet.
> $ sudo /opt/open-mx/bin/omx_info
> Open-MX version 1.3.1
> build: jrandall at tommy:/usr/local/src/open-mx/open-mx-1.3.1 Fri Aug
> 13 19:07:08 BST 2010
>
> Found 1 boards (32 max) supporting 32 endpoints each:
> tommy:0 (board #0 name eth3 addr 00:1b:21:4f:4b:e6)
> managed by driver 'ixgbe'
> attached to numa node 0
>
> Peer table is ready, mapper is 00:00:00:00:00:00
> ================================================
> 0) 00:1b:21:4f:4b:e6 tommy:0
> 1) 00:1b:21:4d:ba:92 renton:0
> 2) 00:1b:21:4f:4d:5a begbie:0
The output of omx_endpoint_info shows all 32 endpoints are available.
> $ sudo /opt/open-mx/bin/omx_endpoint_info
> tommy:0 (board #0 name eth3 addr 00:1b:21:4f:4b:e6)
> ==============================================
> raw open by pid 20653 (omxoed)
> 0 regular endpoints open (out of 32)
>
When I run pvfs2-server, with PVFS2_DEBUGMASK="all" I get a "Remote
Endpoint is Closed" error and the server exits with code 255.
> $ sudo /usr/local/sbin/pvfs2-server /etc/pvfs2-fs.conf -d
> [S 08/14 16:40] PVFS2 Server on node tommy version 2.8.2 starting...
> [D 08/14 16:40] Logging all (mask 18446744073709551615)
> [D 08/14 16:40] PINT_encode_initialize
> [D 08/14 16:40] lebf_initialize
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] PINT_do_request_commit: commit node 0x7fff0a7e6e40
> [D 08/14 16:40] node stored at 0
> [D 08/14 16:40] clearing tree
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] PINT_do_request_commit: commit node 0x7fff0a7e6e40
> [D 08/14 16:40] node stored at 0
> [D 08/14 16:40] clearing tree
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_req_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_req
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] check_resp_size
> [D 08/14 16:40] encode_common
> [D 08/14 16:40] lebf_encode_resp
> [D 08/14 16:40] lebf_encode_rel
> [D 08/14 16:40] Passing mx://tommy:0:0 as BMI listen address.
> OMX: Emulating MX_DISABLE_SHMEM as OMX_DISABLE_SHARED
> OMX: Forcing shared comms to disabled
> OMX: Setting 4 bits of context id at offset 60 in matching
> [D 08/14 16:40] Server using shm key hint: 1937657271
> [D 08/14 16:40] [BMI CONTROL]: BMI_set_info: set_info: 0 option: 11
> [D 08/14 16:40] [BMI CONTROL]: BMI_set_info: set_info: 0 option: 12
> [D 08/14 16:40] dbpf_thread_initialize: initialized
> [D 08/14 16:40] dbpf_thread_function started
> [D 08/14 16:40] [SYNC_COALESCE]: dbpf_sync_context_init for context
> 0 called
> OMX: Completing iconnect request: Remote Endpoint is Closed
My pvfs2-fs.conf file contains:
> <Defaults>
> UnexpectedRequests 50
> EventLogging all
> EnableTracing no
> LogStamp datetime
> BMIModules bmi_mx
> FlowModules flowproto_multiqueue
> PerfUpdateInterval 1000
> ServerJobBMITimeoutSecs 30
> ServerJobFlowTimeoutSecs 30
> ClientJobBMITimeoutSecs 300
> ClientJobFlowTimeoutSecs 300
> ClientRetryLimit 5
> ClientRetryDelayMilliSecs 2000
> PrecreateBatchSize 512
> PrecreateLowThreshold 256
>
> StorageSpace /raid/pvfs2-storage-space
> LogFile /var/log/pvfs2-server.log
> </Defaults>
>
> <Aliases>
> Alias begbie mx://begbie:0:0
> Alias renton mx://renton:0:0
> Alias tommy mx://tommy:0:0
> </Aliases>
>
> <Filesystem>
> Name pvfs2-fs
> ID 1937657241
> RootHandle 1048576
> FileStuffing yes
> <MetaHandleRanges>
> Range begbie 3-1537228672809129302
> Range renton 1537228672809129303-3074457345618258602
> Range tommy 3074457345618258603-4611686018427387902
> </MetaHandleRanges>
> <DataHandleRanges>
> Range begbie 4611686018427387903-6148914691236517202
> Range renton 6148914691236517203-7686143364045646502
> Range tommy 7686143364045646503-9223372036854775802
> </DataHandleRanges>
> <StorageHints>
> TroveSyncMeta yes
> TroveSyncData no
> TroveMethod alt-aio
> </StorageHints>
> </Filesystem>
More information about the Pvfs2-users
mailing list