[Pvfs2-users] PVFS server won't start
Phil Carns
carns at mcs.anl.gov
Mon Jun 29 16:42:19 EDT 2009
Hi Randy,
It looks like maybe one of the entries in the attributes db is damaged,
but the error output doesn't give much detail.
Any chance you could apply the attached patch and show the log output
again? It doesn't change the behavior other than to print out more
error messages in the cases that (I think) you are hitting.
thanks,
-Phil
Randall Martin wrote:
> I noticed a similar thread where someone ran a fsck and recovered. I
> tried a fsck with no luck. I ran db_verify on all of the .db files and
> it didn’t show anything. Below is the debug output of the server:
>
> [D 06/29 15:29] Passing tcp://oss004-4:3337 as BMI listen address.
> [D 06/29 15:29] BMI_tcp_initialize: Initializing TCP/IP module.
> [D 06/29 15:29] BMI_tcp_initialize: TCP/IP module successfully initialized.
> [D 06/29 15:29] Server using shm key hint: 373672738
> [D 06/29 15:29] [BMI CONTROL]: BMI_set_info: set_info: 0 option: 11
> [D 06/29 15:29] Default socket buffers send:16384 receive:87380
> [D 06/29 15:29] Setting socket buffer size for send:0 receive:0
> [D 06/29 15:29] Reread socket buffers send:16384 receive:87380
> [D 06/29 15:29] [BMI CONTROL]: BMI_set_info: set_info: 0 option: 12
> [D 06/29 15:29] Default socket buffers send:16384 receive:87380
> [D 06/29 15:29] Setting socket buffer size for send:0 receive:0
> [D 06/29 15:29] Reread socket buffers send:16384 receive:87380
> [D 06/29 15:29] dbpf_thread_initialize: initialized
> [D 06/29 15:29] [SYNC_COALESCE]: dbpf_sync_context_init for context 0 called
> [D 06/29 15:29] dbpf_collection_lookup of coll: pvfs2-fs
> [D 06/29 15:29] dbpf using default db cache size.
> [D 06/29 15:29] dbpf using shm key: 1020239961
> [D 06/29 15:29] collection lookup: version is 0.1.4
> [D 06/29 15:29] [SYNC_COALESCE]: dbpf_sync_context_init for context 1 called
> [D 06/29 15:29] dbpf collection 373672578 - Setting handle timeout to
> 360000000 microseconds
> [D 06/29 15:29] - set handle re-use timeout to 360 seconds (ret=0)
> [D 06/29 15:29] dbpf collection 373672578 - Setting cache keywords of
> attribute cache to dh,
> [D 06/29 15:29] Setting dbpf_attr_cache keywords to:
> dh,
> [D 06/29 15:29] dbpf collection 373672578 - Setting cache size of
> attribute cache to 511
> [D 06/29 15:29] dbpf collection 373672578 - Setting maximum elements of
> attribute cache to 1024
> [D 06/29 15:29] dbpf collection 373672578 - Initialize collection attr.
> cache
> [D 06/29 15:29] There are 1 cacheable keywords registered
> [D 06/29 15:29] dbpf_attr_cache_initialize: initialized
> [D 06/29 15:29] dbpf collection 373672578 - Setting collection handle
> ranges to
> 4323455642275676148-4611686018427387890,8935141660703064036-9223372036854775778
> [D 06/29 15:29] op_queue add: 0x9f96380
> [D 06/29 15:29] dbpf_thread_function started
> [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES)
> [D 06/29 15:29] handle_new_connection: Assigning socket 11 to new method
> addr.
> [D 06/29 15:29] tcp_do_work_recv: Reading header for new op.
> [D 06/29 15:29] tcp_do_work_recv: Received new message; mode: 2.
> [D 06/29 15:29] tcp_do_work_recv: tag: 5865658
> [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES) (ret: 1)
> [D 06/29 15:29] op_queue add: 0x9f96380
> [D 06/29 15:29] handle_new_connection: Assigning socket 12 to new method
> addr.
> [D 06/29 15:29] op_queue add: 0x9f9da50
> [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES)
> [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES) (ret: 1)
> [D 06/29 15:29] op_queue add: 0x9f9da50
> [D 06/29 15:29] op_queue add: 0x9fa63d0
> [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES)
> [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES) (ret: 1)
> [D 06/29 15:29] op_queue add: 0x9fa63d0
> [D 06/29 15:29] op_queue add: 0x9fad360
> [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES)
> [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES) (ret: 1)
> [D 06/29 15:29] op_queue add: 0x9fad360
> [D 06/29 15:29] op_queue add: 0x9fb0bf0
> [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES)
> [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES) (ret: 1)
> [D 06/29 15:29] op_queue add: 0x9fb0bf0
> [D 06/29 15:29] op_queue add: 0x9fb2f90
> [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES)
> [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES) (ret: 1)
> [D 06/29 15:29] op_queue add: 0x9fb2f90
> [D 06/29 15:29] op_queue add: 0x9fb5ab0
> [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES)
> [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES) (ret: 1)
> [D 06/29 15:29] op_queue add: 0x9fb5ab0
> [D 06/29 15:29] op_queue add: 0x9fc7a30
> [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES)
> [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES) (ret: 1)
> [D 06/29 15:29] op_queue add: 0x9fc7a30
> [D 06/29 15:29] op_queue add: 0x9fca500
> [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES)
> [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES) (ret: 1)
> [D 06/29 15:29] op_queue add: 0x9fca500
> [D 06/29 15:29] op_queue add: 0x9fca690
> [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES)
> [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES) (ret: 1)
> [D 06/29 15:29] op_queue add: 0x9fca690
> [D 06/29 15:29] op_queue add: 0x9fe1980
> [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES)
> [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES) (ret: 1)
> [D 06/29 15:29] op_queue add: 0x9fe1980
> [D 06/29 15:29] op_queue add: 0x9fe2330
> [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES)
> [E 06/29 15:29] dbpf_dspace_iterate_handles_op_svc: Invalid argument
> [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
> (DSPACE_ITERATE_HANDLES) (ret: -1073742095)
> [D 06/29 15:29] op_queue add: 0x9fe2330
> [D 06/29 15:29] trove_dspace_iterate_handles failed
> [E 06/29 15:29] Error adding handle range
> 4323455642275676148-4611686018427387890,8935141660703064036-9223372036854775778
> to filesystem pvfs2-fs
> [E 06/29 15:29] Error: Could not initialize server interfaces; aborting.
> [E 06/29 15:29] Error: Could not initialize server; aborting.
> [D 06/29 15:29] *** server shutdown in progress ***
>
>
> -Randy
>
> ------------------------------------------------------------------------
> *From: *Randall Martin <wolf at clemson.edu>
> *Date: *Mon, 29 Jun 2009 14:05:33 -0400
> *To: *<pvfs2-users at beowulf-underground.org>
> *Subject: *[Pvfs2-users] PVFS server won't start
>
> One of our PVFS servers crashed and now it won’t start back. It was
> previously working since June 2 until today’s crash. Any ideas on how
> to fix it? I was running the 2.8.1 released version, but I also tried
> the HEAD version with no change in symptoms.
>
>> From the server log:
>
> [D 06/29 13:49] PVFS2 Server version 2.8.1pre1-2009-06-26-182521 starting.
> [E 06/29 13:49] dbpf_dspace_iterate_handles_op_svc: Invalid argument
> [E 06/29 13:49] Error adding handle range
> 4323455642275676148-4611686018427387890,8935141660703064036-9223372036854775778
> to filesystem pvfs2-fs
> [E 06/29 13:49] Error: Could not initialize server interfaces; aborting.
> [E 06/29 13:49] Error: Could not initialize server; aborting.
>
> My config file:
>
>
> <Defaults>
> UnexpectedRequests 50
> EventLogging none
> EnableTracing no
> LogStamp datetime
> BMIModules bmi_tcp
> FlowModules flowproto_multiqueue
> PerfUpdateInterval 1000
> ServerJobBMITimeoutSecs 30
> ServerJobFlowTimeoutSecs 30
> ClientJobBMITimeoutSecs 300
> ClientJobFlowTimeoutSecs 300
> ClientRetryLimit 60
> ClientRetryDelayMilliSecs 10000
> PrecreateBatchSize 512
> PrecreateLowThreshold 256
> </Defaults>
>
> <Aliases>
> Alias oss001-1 tcp://oss001-1:3334
> Alias oss001-2 tcp://oss001-2:3335
> Alias oss001-3 tcp://oss001-3:3336
> Alias oss001-4 tcp://oss001-4:3337
>
> Alias oss002-1 tcp://oss002-1:3334
> Alias oss002-2 tcp://oss002-2:3335
> Alias oss002-3 tcp://oss002-3:3336
> Alias oss002-4 tcp://oss002-4:3337
>
> Alias oss003-1 tcp://oss003-1:3334
> Alias oss003-2 tcp://oss003-2:3335
> Alias oss003-3 tcp://oss003-3:3336
> Alias oss003-4 tcp://oss003-4:3337
>
> Alias oss004-1 tcp://oss004-1:3334
> Alias oss004-2 tcp://oss004-2:3335
> Alias oss004-3 tcp://oss004-3:3336
> Alias oss004-4 tcp://oss004-4:3337
> </Aliases>
>
>
> <ServerOptions>
> Server oss001-1
> StorageSpace /ost1
> LogFile /var/log/pvfs2-server.oss001-1.log
> </ServerOptions>
> <ServerOptions>
> Server oss001-2
> StorageSpace /ost2
> LogFile /var/log/pvfs2-server.oss001-2.log
> </ServerOptions>
> <ServerOptions>
> Server oss001-3
> StorageSpace /ost3
> LogFile /var/log/pvfs2-server.oss001-3.log
> </ServerOptions>
> <ServerOptions>
> Server oss001-4
> StorageSpace /ost4
> LogFile /var/log/pvfs2-server.oss001-4.log
> </ServerOptions>
>
>
> <ServerOptions>
> Server oss002-1
> StorageSpace /ost5
> LogFile /var/log/pvfs2-server.oss002-1.log
> </ServerOptions>
> <ServerOptions>
> Server oss002-2
> StorageSpace /ost6
> LogFile /var/log/pvfs2-server.oss002-2.log
> </ServerOptions>
> <ServerOptions>
> Server oss002-3
> StorageSpace /ost7
> LogFile /var/log/pvfs2-server.oss002-3.log
> </ServerOptions>
> <ServerOptions>
> Server oss002-4
> StorageSpace /ost8
> LogFile /var/log/pvfs2-server.oss002-4.log
> </ServerOptions>
>
>
> <ServerOptions>
> Server oss003-1
> StorageSpace /ost9
> LogFile /var/log/pvfs2-server.oss003-1.log
> </ServerOptions>
> <ServerOptions>
> Server oss003-2
> StorageSpace /ost10
> LogFile /var/log/pvfs2-server.oss003-2.log
> </ServerOptions>
> <ServerOptions>
> Server oss003-3
> StorageSpace /ost11
> LogFile /var/log/pvfs2-server.oss003-3.log
> </ServerOptions>
> <ServerOptions>
> Server oss003-4
> StorageSpace /ost12
> LogFile /var/log/pvfs2-server.oss003-4.log
> </ServerOptions>
>
>
> <ServerOptions>
> Server oss004-1
> StorageSpace /ost13
> LogFile /var/log/pvfs2-server.oss004-1.log
> </ServerOptions>
> <ServerOptions>
> Server oss004-2
> StorageSpace /ost14
> LogFile /var/log/pvfs2-server.oss004-2.log
> </ServerOptions>
> <ServerOptions>
> Server oss004-3
> StorageSpace /ost15
> LogFile /var/log/pvfs2-server.oss004-3.log
> </ServerOptions>
> <ServerOptions>
> Server oss004-4
> StorageSpace /ost16
> LogFile /var/log/pvfs2-server.oss004-4.log
> </ServerOptions>
>
> <Filesystem>
> Name pvfs2-fs
> ID 373672578
> RootHandle 1048576
> FileStuffing yes
> <MetaHandleRanges>
> Range oss001-1 3-288230376151711745
> Range oss001-2 288230376151711746-576460752303423488
> Range oss001-3 576460752303423489-864691128455135231
> Range oss001-4 864691128455135232-1152921504606846974
> Range oss002-1 1152921504606846975-1441151880758558717
> Range oss002-2 1441151880758558718-1729382256910270460
> Range oss002-3 1729382256910270461-2017612633061982203
> Range oss002-4 2017612633061982204-2305843009213693946
> Range oss003-1 2305843009213693947-2594073385365405689
> Range oss003-2 2594073385365405690-2882303761517117432
> Range oss003-3 2882303761517117433-3170534137668829175
> Range oss003-4 3170534137668829176-3458764513820540918
> Range oss004-1 3458764513820540919-3746994889972252661
> Range oss004-2 3746994889972252662-4035225266123964404
> Range oss004-3 4035225266123964405-4323455642275676147
> Range oss004-4 4323455642275676148-4611686018427387890
> </MetaHandleRanges>
> <DataHandleRanges>
> Range oss001-1 4611686018427387891-4899916394579099633
> Range oss001-2 4899916394579099634-5188146770730811376
> Range oss001-3 5188146770730811377-5476377146882523119
> Range oss001-4 5476377146882523120-5764607523034234862
> Range oss002-1 5764607523034234863-6052837899185946605
> Range oss002-2 6052837899185946606-6341068275337658348
> Range oss002-3 6341068275337658349-6629298651489370091
> Range oss002-4 6629298651489370092-6917529027641081834
> Range oss003-1 6917529027641081835-7205759403792793577
> Range oss003-2 7205759403792793578-7493989779944505320
> Range oss003-3 7493989779944505321-7782220156096217063
> Range oss003-4 7782220156096217064-8070450532247928806
> Range oss004-1 8070450532247928807-8358680908399640549
> Range oss004-2 8358680908399640550-8646911284551352292
> Range oss004-3 8646911284551352293-8935141660703064035
> Range oss004-4 8935141660703064036-9223372036854775778
> </DataHandleRanges>
> <StorageHints>
> TroveSyncMeta no
> TroveSyncData no
> TroveMethod alt-aio
> </StorageHints>
> <Distribution>
> Name simple_stripe
> Param strip_size
> Value 1048576
> </Distribution>
> </Filesystem>
>
>
> Thanks,
> Randy
>
> ------------------------------------------------------------------------
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: iterate-error.patch
Type: text/x-patch
Size: 1821 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-users/attachments/20090629/a038fe1e/iterate-error.bin
More information about the Pvfs2-users
mailing list