[Pvfs2-users] Help: pvfs2-server won't start, errors detected

Phil Carns carns at mcs.anl.gov
Wed Apr 4 16:03:33 EDT 2012


Here is a newer copy:

http://www.orangefs.org/fisheye/orangefs/browse/~raw,r=9138/orangefs/trunk/doc/db-recovery.txt

-Phil

On 04/04/2012 03:58 PM, Jim Kusznir wrote:
> Hmm...my db_recovery docs say that This example only works for the
> keyval.db file.  The dataspace_attributes.db
> file requires a different modification (not provided here).  The file
> I'm having trouble with is the dataspace_attributes.db.
>
> --Jim
>
> On Wed, Apr 4, 2012 at 11:04 AM, Phil Carns<carns at mcs.anl.gov>  wrote:
>> Another option to consider is the technique described in
>> pvfs2/doc/db-recovery.txt.  It describes how to dump and reload two types of
>> db files.  The latter is the one you want in this case
>> (dataspace_attributes.db).  Please make a backup copy of the original .db
>> file if you try this.
>>
>> One thing to look out for that isn't mentioned in the doc is that the
>> rebuilt dataspace_attributes.db will probably be _much_ smaller than the
>> original.  This doesn't mean that it lost data, its just that Berkeley DB
>> will pack it much more efficiently when all of the entries are rebuilt at
>> once.
>>
>> -Phil
>>
>>
>> On 04/02/2012 01:09 PM, Jim Kusznir wrote:
>>> Thanks Boyd:
>>>
>>> We have 3 io servers, each also running metadata servers.  One will
>>> not come up (that's the 3rd server).  I did try and run the db check
>>> command (forget the specifics), and it did return a single chunk of
>>> entries that are not readable.  As you may guess from the above, I've
>>> never interacted with bdb on a direct or low level.  I don't have a
>>> good answer for #3; I noticed about 1/3 of the directory entries were
>>> "red" on the terminal, and several individuals contacted me with pvfs
>>> problems.
>>>
>>> I will begin building new versions of bdb.  Do I need to install this
>>> just on the servers, or do the clients need it as well?
>>>
>>> --Jim
>>>
>>> On Sun, Apr 1, 2012 at 4:03 PM, Boyd Wilson<boydw at omnibond.com>    wrote:
>>>> Jim,
>>>> We have been discussing your issue internally.   A few questions:
>>>> 1. How many metadata servers do you have?
>>>> 2. Do you know which one is affected (if there is more than one)?
>>>> 3. How much of the file system can you currently see?
>>>>
>>>> The issue you mentioned seems to be the one we have seen with the earlier
>>>> versions of BerkeleyDB and we have not seen them with the newer versions
>>>> as
>>>> Becky mentioned.  In our discussions we can't recall if we tried doing a
>>>> low
>>>> level BDB access to the MD for the unaffected entries and back them up so
>>>> they can be restored in a new BDB.  If you are comfortable with lower
>>>> level
>>>> BDB commands you may want to see if you can read the entries up to the
>>>> corruption and after, if you can do both, you may be able to write a
>>>> small
>>>> program to read out all the entries into a file or another BDB, then
>>>> rebuild
>>>> the BDB with the valid entries.
>>>>
>>>> thx
>>>> -boyd
>>>>
>>>> On Sat, Mar 31, 2012 at 6:07 PM, Becky Ligon<ligon at omnibond.com>    wrote:
>>>>> Jim:
>>>>>
>>>>> I understand your situation.  Here at Clemson University, we went
>>>>> through
>>>>> the same situation a couple of years ago.  Now, we backup the metadata
>>>>> databases.  We don't have the space to backup our data either!
>>>>>
>>>>> Under no circumstances should you run pvfs2-fsck.  If you do, then we
>>>>> won't be able to help at all, if you run this command in the destructive
>>>>> mode.  If you're willing, Omnibond MAY be able to write some utilities
>>>>> that
>>>>> we help you recover most of the data.  You will have to speak to Boyd
>>>>> Wilson
>>>>> (boyd.wilson at omnibond.com) and workout something.
>>>>>
>>>>> Becky Ligon
>>>>>
>>>>>
>>>>> On Fri, Mar 30, 2012 at 5:55 PM, Jim Kusznir<jkusznir at gmail.com>    wrote:
>>>>>> I made no changes to my environment; it was up and running just fine.
>>>>>> I ran db_recover, and it immediately returned, with no apparent sign
>>>>>> of doing anything but creating a log.000000001 file.
>>>>>>
>>>>>> I have the centos DB installed, db4-4.3.29-10.el5
>>>>>>
>>>>>> I have no backups; this is my high performance filesystem of 99TB; it
>>>>>> is the largest disk we have and therefore have no means of backing it
>>>>>> up.  We don't have anything big enough to hold that much data.
>>>>>>
>>>>>> Is there any hope?  Can we just identify and delete the files that
>>>>>> have the db dammange on it?  (Note that I don't even have anywhere to
>>>>>> back up this data to temporally if we do get it running, so I'd need
>>>>>> to "fix in place".
>>>>>>
>>>>>> thanks!
>>>>>> --Jim
>>>>>>
>>>>>> --Jim
>>>>>>
>>>>>> On Fri, Mar 30, 2012 at 2:44 PM, Becky Ligon<ligon at omnibond.com>
>>>>>>   wrote:
>>>>>>> Jim:
>>>>>>>
>>>>>>> If you haven't made any recent changes to your pvfs environment or
>>>>>>> Berkeley
>>>>>>> Db installation, then it looks like you have a corrupted metadata
>>>>>>> database.
>>>>>>> There is no way to easily recover.  Sometimes, the Berkeley db command
>>>>>>> "db_recover" might work, but PVFS doesn't have transactions turned on,
>>>>>>> so
>>>>>>> normally it doesn't work.  It's worth a try, just to be sure.
>>>>>>>
>>>>>>> Do you have any recent backups of the databases?  If so, then you will
>>>>>>> need
>>>>>>> to use a set of backups that were created around the same time, so the
>>>>>>> databases will be somewhat consistent with each other.
>>>>>>>
>>>>>>> Which version of Berkeley are you using?  We have had corruption
>>>>>>> issues
>>>>>>> with
>>>>>>> older versions of it.  We strongly recommend 4.8 or higher.  There are
>>>>>>> some
>>>>>>> know problems with threads in the older versions .
>>>>>>>
>>>>>>> Becky Ligon
>>>>>>>
>>>>>>> On Fri, Mar 30, 2012 at 3:28 PM, Jim Kusznir<jkusznir at gmail.com>
>>>>>>> wrote:
>>>>>>>> Hi all:
>>>>>>>>
>>>>>>>> I got some notices from my users with "wierdness with pvfs2" this
>>>>>>>> morning, and went and investagated.  eventually, I found the
>>>>>>>> following
>>>>>>>> on one of my 3 serers:
>>>>>>>>
>>>>>>>> [S 03/30 12:22] PVFS2 Server on node pvfs2-io-0-2 version 2.8.2
>>>>>>>> starting...
>>>>>>>> [E 03/30 12:23] Warning: got invalid handle or key size in
>>>>>>>> dbpf_dspace_iterate_handles().
>>>>>>>> [E 03/30 12:23] Warning: skipping entry.
>>>>>>>> [E 03/30 12:23] c_get failed on iteration 3044
>>>>>>>> [E 03/30 12:23] dbpf_dspace_iterate_handles_op_svc: Invalid argument
>>>>>>>> [E 03/30 12:23] Error adding handle range
>>>>>>>> 1431655768-2147483649,3579139414-4294967295 to filesystem pvfs2-fs
>>>>>>>> [E 03/30 12:23] Error: Could not initialize server interfaces;
>>>>>>>> aborting.
>>>>>>>> [E 03/30 12:23] Error: Could not initialize server; aborting.
>>>>>>>>
>>>>>>>> ------------
>>>>>>>> pvfs2-fs.conf:
>>>>>>>> -----------
>>>>>>>>
>>>>>>>> <Defaults>
>>>>>>>>         UnexpectedRequests 50
>>>>>>>>         EventLogging none
>>>>>>>>         LogStamp datetime
>>>>>>>>         BMIModules bmi_tcp
>>>>>>>>         FlowModules flowproto_multiqueue
>>>>>>>>         PerfUpdateInterval 1000
>>>>>>>>         ServerJobBMITimeoutSecs 30
>>>>>>>>         ServerJobFlowTimeoutSecs 30
>>>>>>>>         ClientJobBMITimeoutSecs 300
>>>>>>>>         ClientJobFlowTimeoutSecs 300
>>>>>>>>         ClientRetryLimit 5
>>>>>>>>         ClientRetryDelayMilliSecs 2000
>>>>>>>>         StorageSpace /mnt/pvfs2
>>>>>>>>         LogFile /var/log/pvfs2-server.log
>>>>>>>> </Defaults>
>>>>>>>>
>>>>>>>> <Aliases>
>>>>>>>>         Alias pvfs2-io-0-0 tcp://pvfs2-io-0-0:3334
>>>>>>>>         Alias pvfs2-io-0-1 tcp://pvfs2-io-0-1:3334
>>>>>>>>         Alias pvfs2-io-0-2 tcp://pvfs2-io-0-2:3334
>>>>>>>> </Aliases>
>>>>>>>>
>>>>>>>> <Filesystem>
>>>>>>>>         Name pvfs2-fs
>>>>>>>>         ID 62659950
>>>>>>>>         RootHandle 1048576
>>>>>>>>         <MetaHandleRanges>
>>>>>>>>                 Range pvfs2-io-0-0 4-715827885
>>>>>>>>                 Range pvfs2-io-0-1 715827886-1431655767
>>>>>>>>                 Range pvfs2-io-0-2 1431655768-2147483649
>>>>>>>>         </MetaHandleRanges>
>>>>>>>>         <DataHandleRanges>
>>>>>>>>                 Range pvfs2-io-0-0 2147483650-2863311531
>>>>>>>>                 Range pvfs2-io-0-1 2863311532-3579139413
>>>>>>>>                 Range pvfs2-io-0-2 3579139414-4294967295
>>>>>>>>         </DataHandleRanges>
>>>>>>>>         <StorageHints>
>>>>>>>>                 TroveSyncMeta yes
>>>>>>>>                 TroveSyncData no
>>>>>>>>         </StorageHints>
>>>>>>>> </Filesystem>
>>>>>>>> -------------
>>>>>>>> Any suggestions for recovery?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> --Jim
>>>>>>>> _______________________________________________
>>>>>>>> Pvfs2-users mailing list
>>>>>>>> Pvfs2-users at beowulf-underground.org
>>>>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Becky Ligon
>>>>>>> OrangeFS Support and Development
>>>>>>> Omnibond Systems
>>>>>>> Anderson, South Carolina
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Becky Ligon
>>>>> OrangeFS Support and Development
>>>>> Omnibond Systems
>>>>> Anderson, South Carolina
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pvfs2-users mailing list
>>>>> Pvfs2-users at beowulf-underground.org
>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>>>
>>> _______________________________________________
>>> Pvfs2-users mailing list
>>> Pvfs2-users at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>> _______________________________________________
>> Pvfs2-users mailing list
>> Pvfs2-users at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users



More information about the Pvfs2-users mailing list