[Pvfs2-users] Help: pvfs2-server won't start, errors detected
Phil Carns
carns at mcs.anl.gov
Wed Apr 4 16:03:33 EDT 2012
Here is a newer copy:
http://www.orangefs.org/fisheye/orangefs/browse/~raw,r=9138/orangefs/trunk/doc/db-recovery.txt
-Phil
On 04/04/2012 03:58 PM, Jim Kusznir wrote:
> Hmm...my db_recovery docs say that This example only works for the
> keyval.db file. The dataspace_attributes.db
> file requires a different modification (not provided here). The file
> I'm having trouble with is the dataspace_attributes.db.
>
> --Jim
>
> On Wed, Apr 4, 2012 at 11:04 AM, Phil Carns<carns at mcs.anl.gov> wrote:
>> Another option to consider is the technique described in
>> pvfs2/doc/db-recovery.txt. It describes how to dump and reload two types of
>> db files. The latter is the one you want in this case
>> (dataspace_attributes.db). Please make a backup copy of the original .db
>> file if you try this.
>>
>> One thing to look out for that isn't mentioned in the doc is that the
>> rebuilt dataspace_attributes.db will probably be _much_ smaller than the
>> original. This doesn't mean that it lost data, its just that Berkeley DB
>> will pack it much more efficiently when all of the entries are rebuilt at
>> once.
>>
>> -Phil
>>
>>
>> On 04/02/2012 01:09 PM, Jim Kusznir wrote:
>>> Thanks Boyd:
>>>
>>> We have 3 io servers, each also running metadata servers. One will
>>> not come up (that's the 3rd server). I did try and run the db check
>>> command (forget the specifics), and it did return a single chunk of
>>> entries that are not readable. As you may guess from the above, I've
>>> never interacted with bdb on a direct or low level. I don't have a
>>> good answer for #3; I noticed about 1/3 of the directory entries were
>>> "red" on the terminal, and several individuals contacted me with pvfs
>>> problems.
>>>
>>> I will begin building new versions of bdb. Do I need to install this
>>> just on the servers, or do the clients need it as well?
>>>
>>> --Jim
>>>
>>> On Sun, Apr 1, 2012 at 4:03 PM, Boyd Wilson<boydw at omnibond.com> wrote:
>>>> Jim,
>>>> We have been discussing your issue internally. A few questions:
>>>> 1. How many metadata servers do you have?
>>>> 2. Do you know which one is affected (if there is more than one)?
>>>> 3. How much of the file system can you currently see?
>>>>
>>>> The issue you mentioned seems to be the one we have seen with the earlier
>>>> versions of BerkeleyDB and we have not seen them with the newer versions
>>>> as
>>>> Becky mentioned. In our discussions we can't recall if we tried doing a
>>>> low
>>>> level BDB access to the MD for the unaffected entries and back them up so
>>>> they can be restored in a new BDB. If you are comfortable with lower
>>>> level
>>>> BDB commands you may want to see if you can read the entries up to the
>>>> corruption and after, if you can do both, you may be able to write a
>>>> small
>>>> program to read out all the entries into a file or another BDB, then
>>>> rebuild
>>>> the BDB with the valid entries.
>>>>
>>>> thx
>>>> -boyd
>>>>
>>>> On Sat, Mar 31, 2012 at 6:07 PM, Becky Ligon<ligon at omnibond.com> wrote:
>>>>> Jim:
>>>>>
>>>>> I understand your situation. Here at Clemson University, we went
>>>>> through
>>>>> the same situation a couple of years ago. Now, we backup the metadata
>>>>> databases. We don't have the space to backup our data either!
>>>>>
>>>>> Under no circumstances should you run pvfs2-fsck. If you do, then we
>>>>> won't be able to help at all, if you run this command in the destructive
>>>>> mode. If you're willing, Omnibond MAY be able to write some utilities
>>>>> that
>>>>> we help you recover most of the data. You will have to speak to Boyd
>>>>> Wilson
>>>>> (boyd.wilson at omnibond.com) and workout something.
>>>>>
>>>>> Becky Ligon
>>>>>
>>>>>
>>>>> On Fri, Mar 30, 2012 at 5:55 PM, Jim Kusznir<jkusznir at gmail.com> wrote:
>>>>>> I made no changes to my environment; it was up and running just fine.
>>>>>> I ran db_recover, and it immediately returned, with no apparent sign
>>>>>> of doing anything but creating a log.000000001 file.
>>>>>>
>>>>>> I have the centos DB installed, db4-4.3.29-10.el5
>>>>>>
>>>>>> I have no backups; this is my high performance filesystem of 99TB; it
>>>>>> is the largest disk we have and therefore have no means of backing it
>>>>>> up. We don't have anything big enough to hold that much data.
>>>>>>
>>>>>> Is there any hope? Can we just identify and delete the files that
>>>>>> have the db dammange on it? (Note that I don't even have anywhere to
>>>>>> back up this data to temporally if we do get it running, so I'd need
>>>>>> to "fix in place".
>>>>>>
>>>>>> thanks!
>>>>>> --Jim
>>>>>>
>>>>>> --Jim
>>>>>>
>>>>>> On Fri, Mar 30, 2012 at 2:44 PM, Becky Ligon<ligon at omnibond.com>
>>>>>> wrote:
>>>>>>> Jim:
>>>>>>>
>>>>>>> If you haven't made any recent changes to your pvfs environment or
>>>>>>> Berkeley
>>>>>>> Db installation, then it looks like you have a corrupted metadata
>>>>>>> database.
>>>>>>> There is no way to easily recover. Sometimes, the Berkeley db command
>>>>>>> "db_recover" might work, but PVFS doesn't have transactions turned on,
>>>>>>> so
>>>>>>> normally it doesn't work. It's worth a try, just to be sure.
>>>>>>>
>>>>>>> Do you have any recent backups of the databases? If so, then you will
>>>>>>> need
>>>>>>> to use a set of backups that were created around the same time, so the
>>>>>>> databases will be somewhat consistent with each other.
>>>>>>>
>>>>>>> Which version of Berkeley are you using? We have had corruption
>>>>>>> issues
>>>>>>> with
>>>>>>> older versions of it. We strongly recommend 4.8 or higher. There are
>>>>>>> some
>>>>>>> know problems with threads in the older versions .
>>>>>>>
>>>>>>> Becky Ligon
>>>>>>>
>>>>>>> On Fri, Mar 30, 2012 at 3:28 PM, Jim Kusznir<jkusznir at gmail.com>
>>>>>>> wrote:
>>>>>>>> Hi all:
>>>>>>>>
>>>>>>>> I got some notices from my users with "wierdness with pvfs2" this
>>>>>>>> morning, and went and investagated. eventually, I found the
>>>>>>>> following
>>>>>>>> on one of my 3 serers:
>>>>>>>>
>>>>>>>> [S 03/30 12:22] PVFS2 Server on node pvfs2-io-0-2 version 2.8.2
>>>>>>>> starting...
>>>>>>>> [E 03/30 12:23] Warning: got invalid handle or key size in
>>>>>>>> dbpf_dspace_iterate_handles().
>>>>>>>> [E 03/30 12:23] Warning: skipping entry.
>>>>>>>> [E 03/30 12:23] c_get failed on iteration 3044
>>>>>>>> [E 03/30 12:23] dbpf_dspace_iterate_handles_op_svc: Invalid argument
>>>>>>>> [E 03/30 12:23] Error adding handle range
>>>>>>>> 1431655768-2147483649,3579139414-4294967295 to filesystem pvfs2-fs
>>>>>>>> [E 03/30 12:23] Error: Could not initialize server interfaces;
>>>>>>>> aborting.
>>>>>>>> [E 03/30 12:23] Error: Could not initialize server; aborting.
>>>>>>>>
>>>>>>>> ------------
>>>>>>>> pvfs2-fs.conf:
>>>>>>>> -----------
>>>>>>>>
>>>>>>>> <Defaults>
>>>>>>>> UnexpectedRequests 50
>>>>>>>> EventLogging none
>>>>>>>> LogStamp datetime
>>>>>>>> BMIModules bmi_tcp
>>>>>>>> FlowModules flowproto_multiqueue
>>>>>>>> PerfUpdateInterval 1000
>>>>>>>> ServerJobBMITimeoutSecs 30
>>>>>>>> ServerJobFlowTimeoutSecs 30
>>>>>>>> ClientJobBMITimeoutSecs 300
>>>>>>>> ClientJobFlowTimeoutSecs 300
>>>>>>>> ClientRetryLimit 5
>>>>>>>> ClientRetryDelayMilliSecs 2000
>>>>>>>> StorageSpace /mnt/pvfs2
>>>>>>>> LogFile /var/log/pvfs2-server.log
>>>>>>>> </Defaults>
>>>>>>>>
>>>>>>>> <Aliases>
>>>>>>>> Alias pvfs2-io-0-0 tcp://pvfs2-io-0-0:3334
>>>>>>>> Alias pvfs2-io-0-1 tcp://pvfs2-io-0-1:3334
>>>>>>>> Alias pvfs2-io-0-2 tcp://pvfs2-io-0-2:3334
>>>>>>>> </Aliases>
>>>>>>>>
>>>>>>>> <Filesystem>
>>>>>>>> Name pvfs2-fs
>>>>>>>> ID 62659950
>>>>>>>> RootHandle 1048576
>>>>>>>> <MetaHandleRanges>
>>>>>>>> Range pvfs2-io-0-0 4-715827885
>>>>>>>> Range pvfs2-io-0-1 715827886-1431655767
>>>>>>>> Range pvfs2-io-0-2 1431655768-2147483649
>>>>>>>> </MetaHandleRanges>
>>>>>>>> <DataHandleRanges>
>>>>>>>> Range pvfs2-io-0-0 2147483650-2863311531
>>>>>>>> Range pvfs2-io-0-1 2863311532-3579139413
>>>>>>>> Range pvfs2-io-0-2 3579139414-4294967295
>>>>>>>> </DataHandleRanges>
>>>>>>>> <StorageHints>
>>>>>>>> TroveSyncMeta yes
>>>>>>>> TroveSyncData no
>>>>>>>> </StorageHints>
>>>>>>>> </Filesystem>
>>>>>>>> -------------
>>>>>>>> Any suggestions for recovery?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> --Jim
>>>>>>>> _______________________________________________
>>>>>>>> Pvfs2-users mailing list
>>>>>>>> Pvfs2-users at beowulf-underground.org
>>>>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Becky Ligon
>>>>>>> OrangeFS Support and Development
>>>>>>> Omnibond Systems
>>>>>>> Anderson, South Carolina
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Becky Ligon
>>>>> OrangeFS Support and Development
>>>>> Omnibond Systems
>>>>> Anderson, South Carolina
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pvfs2-users mailing list
>>>>> Pvfs2-users at beowulf-underground.org
>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>>>
>>> _______________________________________________
>>> Pvfs2-users mailing list
>>> Pvfs2-users at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>> _______________________________________________
>> Pvfs2-users mailing list
>> Pvfs2-users at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
More information about the Pvfs2-users
mailing list