[Pvfs2-users] I/O server won't start
Kevin Harms
harms at alcf.anl.gov
Tue Feb 9 17:02:06 EST 2010
Eric,
so i take it the "repaired" database allowed the pvfs2-server to
start? Based on this it looks like perhaps it suffered a fatal error
soon after since pvfs2-fsck command could not connect to it. What does
teh pvfs-2 server log say?
kevin
On Feb 9, 2010, at 2:28 PM, Eric J. Walter wrote:
>
> Kevin,
>
> Hi, I have done what you have said and repeated the db_dump and
> db_load.
>
> The db_verify of dataspace_attributes.db produces no errors and the
> pvfs2-server starts with no
> errors. Unfortunately, the clients can't seem to communicate with
> the servers after mounting:
>
> >>> /share/apps/pvfs-2.8.1/bin/pvfs2-fsck -v -m /mnt/pvfs2
> [E 15:20:09.068943] job_time_mgr_expire: job time out: cancelling
> bmi operation, job_id: 12.
> [E 15:20:09.069756] Warning: msgpair failed to ib://pvfs-2:3335,
> will retry: Connection timed out
> [E 15:20:09.069808] *** msgpairarray_completion_fn: msgpair to
> server [UNKNOWN] failed: Connection timed out
> [E 15:20:09.069829] *** Non-BMI
> failure
> . [E
> 15:20:09.069859] ERROR: could not initialize any file systems in /
> etc/pvfs2tab. PVFS_util_init_defaults: No such
> device (error class: 0)
> This same thing happens for any command (e.g. pvfs2-ls pvfs-statfs
> etc.)
>
> Perhaps there is something I am missing?
>
> Eric
>
>
> Kevin Harms wrote:
>> Eric,
>>
>> I'm not sure what is wrong with your .db exactly but to use
>> db_load, it needs to be modified to add the keys back in the
>> correct "sorted" order. Where "sorted" means in the order PVFS
>> expects. You need to modify db_load.c to something like this:
>>
>> if ((ret = dbp->set_bt_compare(dbp,
>> PINT_trove_dbpf_ds_attr_compare)) != 0) {
>> dbp->err(dbp, ret, "DB->set_bt_compare");
>> goto err;
>> }
>>
>> Then paste the PINT_trove_dbpf_ds_attr_compare function and
>> associated data structure definitions into the db_load.c source as
>> well. You should get the db_load.c from your particular version of
>> bdb you're using.
>>
>> kevin
>>
>> On Feb 8, 2010, at 7:16 PM, Eric J. Walter wrote:
>>
>>>
>>>
>>> Hi,
>>>
>>> I have a problem starting up an I/O node. It is one of 3 servers
>>> that
>>> we run v2.8.1 on
>>> over Inifiniband. It is not used for metadata. After a finding
>>> a file
>>> which
>>> had '?--?--?' like permissions, I decided to restart the pvfs
>>> servers
>>> and remount all
>>> of the clients. Now, one of the three I/O nodes can't start it's
>>> pvfs2-server.
>>> The other two start correctly.
>>>
>>> Here is the server log from the problem server:
>>>
>>> [D 02/08 19:40] PVFS2 Server version 2.8.1 starting.
>>> [E 02/08 19:40] dbpf_dspace_iterate_handles_op_svc: Invalid argument
>>> [E 02/08 19:40] Error adding handle range
>>> 1537228672809129303
>>> -3074457345618258602,6148914691236517203-7686143364045646502
>>> to filesystem pvfs2-fs
>>> [E 02/08 19:40] Error: Could not initialize server interfaces;
>>> aborting.
>>> [E 02/08 19:40] Error: Could not initialize server; aborting.
>>>
>>> I am also using db4-4.2.52-7.1 of the DB software. Reading
>>> through the
>>> previous
>>> mailing lists discussions, I found that running db_recover on
>>> the .db
>>> files (after backing them up) could be helpful. The only .db file
>>> which
>>> has any problems with verify is
>>> dataspace_attributes.db on the problem I/O node. Here is what it
>>> reports:
>>>
>>>>> # db_verify -o dataspace_attributes.db
>>> db_verify: Page 865: item 57 of unrecognizable type
>>> db_verify: Page 865: gap between items at offset 1376
>>> db_verify: Page 865: item order check unsafe: skipping
>>> db_verify: DB->verify: dataspace_attributes.db: DB_VERIFY_BAD:
>>> Database
>>> verification failed
>>>
>>> So I tried db_recover -v in the same directory and in the directory
>>> above (I am not sure where to run it) and all I get is:
>>>
>>> db_recover: Finding last valid log LSN: file: 1 offset 28
>>>
>>> and a small binary file named "log.0000000001".
>>>
>>> This step seems to do nothing, i.e. the db_verify report doesn't
>>> change
>>> after this.
>>>
>>> I have also tried db_dump -r followed by db_load and this also
>>> does not
>>> change the
>>> db_verify output.
>>>
>>> Is there anything else I can do except wipe the filesystem and
>>> rebuild?
>>>
>>> Thanks for any help I can get.
>>>
>>> Eric J. Walter
>>> Department of Physics
>>> College of William and Mary
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Pvfs2-users mailing list
>>> Pvfs2-users at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2909 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-users/attachments/20100209/c3108dca/smime.bin
More information about the Pvfs2-users
mailing list