[Pvfs2-users] I/O server won't start

Eric J. Walter ejwalt at wm.edu
Tue Feb 9 15:28:32 EST 2010


Kevin,

Hi, I have done what you have said and repeated the db_dump and db_load.

The db_verify of dataspace_attributes.db produces no errors and the 
pvfs2-server starts with no
errors.  Unfortunately, the clients can't seem to communicate with the 
servers after mounting:

 >>> /share/apps/pvfs-2.8.1/bin/pvfs2-fsck -v -m /mnt/pvfs2
[E 15:20:09.068943] job_time_mgr_expire: job time out: cancelling bmi 
operation, job_id: 12.
[E 15:20:09.069756] Warning: msgpair failed to ib://pvfs-2:3335, will 
retry: Connection timed out
[E 15:20:09.069808] *** msgpairarray_completion_fn: msgpair to server 
[UNKNOWN] failed: Connection timed 
out                                                                                                    

[E 15:20:09.069829] *** Non-BMI 
failure.                                                               
[E 15:20:09.069859] ERROR: could not initialize any file systems in 
/etc/pvfs2tab.                     
PVFS_util_init_defaults: No such device (error class: 
0)                                  

This same thing happens for any command (e.g. pvfs2-ls pvfs-statfs  etc.)

Perhaps there is something I am missing?

Eric


Kevin Harms wrote:
> Eric,
>
>   I'm not sure what is wrong with your .db exactly but to use db_load, 
> it needs to be modified to add the keys back in the correct "sorted" 
> order. Where "sorted" means in the order PVFS expects. You need to 
> modify db_load.c to something like this:
>
> if ((ret = dbp->set_bt_compare(dbp, PINT_trove_dbpf_ds_attr_compare)) 
> != 0) {
>         dbp->err(dbp, ret, "DB->set_bt_compare");
>         goto err;
> }
>
> Then paste the PINT_trove_dbpf_ds_attr_compare function and associated 
> data structure definitions into the db_load.c source as well. You 
> should get the db_load.c from your particular version of bdb you're 
> using.
>
> kevin
>
> On Feb 8, 2010, at 7:16 PM, Eric J. Walter wrote:
>
>>
>>
>> Hi,
>>
>> I have a problem starting up an I/O node.  It is one of 3 servers that
>> we run v2.8.1 on
>> over Inifiniband.  It is not used for metadata.   After a finding a file
>> which
>> had '?--?--?' like permissions, I decided to restart the pvfs servers
>> and remount all
>> of the clients.  Now, one of the three I/O nodes can't start it's
>> pvfs2-server.
>> The other two start correctly.
>>
>> Here is the server log from the problem server:
>>
>> [D 02/08 19:40] PVFS2 Server version 2.8.1 starting.
>> [E 02/08 19:40] dbpf_dspace_iterate_handles_op_svc: Invalid argument
>> [E 02/08 19:40] Error adding handle range
>> 1537228672809129303-3074457345618258602,6148914691236517203-7686143364045646502 
>>
>> to filesystem pvfs2-fs
>> [E 02/08 19:40] Error: Could not initialize server interfaces; aborting.
>> [E 02/08 19:40] Error: Could not initialize server; aborting.
>>
>> I am also using db4-4.2.52-7.1 of the DB software.  Reading through the
>> previous
>> mailing lists discussions, I found that running db_recover on the .db
>> files (after backing them up) could be helpful.  The only .db file which
>> has any problems with verify is
>> dataspace_attributes.db on the problem I/O node.  Here is what it 
>> reports:
>>
>>>> # db_verify -o dataspace_attributes.db
>> db_verify: Page 865: item 57 of unrecognizable type
>> db_verify: Page 865: gap between items at offset 1376
>> db_verify: Page 865: item order check unsafe: skipping
>> db_verify: DB->verify: dataspace_attributes.db: DB_VERIFY_BAD: Database
>> verification failed
>>
>> So I tried db_recover -v in the same directory and in the directory
>> above (I am not sure where to run it) and all I get is:
>>
>> db_recover: Finding last valid log LSN: file: 1 offset 28
>>
>> and a small binary file named "log.0000000001".
>>
>> This step seems to do nothing, i.e. the db_verify report doesn't change
>> after this.
>>
>> I have also tried db_dump -r followed by db_load and this also does not
>> change the
>> db_verify output.
>>
>> Is there anything else I can do except wipe the filesystem and rebuild?
>>
>> Thanks for any help I can get.
>>
>> Eric J. Walter
>> Department of Physics
>> College of William and Mary
>>
>>
>>
>>
>> _______________________________________________
>> Pvfs2-users mailing list
>> Pvfs2-users at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>



More information about the Pvfs2-users mailing list