[Pvfs2-users] Help: pvfs2-server won't start, errors detected
Becky Ligon
ligon at omnibond.com
Thu Apr 5 17:08:36 EDT 2012
Phil/Jim:
Should you run a pvfs2-fsck at this point, maybe in non-destructive mode,
to see if we have dangling entries?
Becky
On Thu, Apr 5, 2012 at 4:27 PM, Phil Carns <carns at mcs.anl.gov> wrote:
> On 04/05/2012 01:47 PM, Jim Kusznir wrote:
>
>> I think its repaired. After using Phil's method, I got a file that
>> the pvfs2-display displayed all content, so I started the server and
>> got:
>> [S 04/05 10:45] PVFS2 Server on node pvfs2-io-0-2 version 2.8.2
>> starting...
>> [E 04/05 10:45] Warning: got invalid handle or key size in
>> dbpf_dspace_iterate_handles().
>> [E 04/05 10:45] Warning: skipping entry.
>> [S 04/05 10:45] PVFS2 Server ready.
>>
>> I believe this means recovery is as compelte as possible, and that
>> there's an entry that's missing now, is this correct?
>>
>
> At the very least, the .db file that you have now is entirely valid from
> Berkeley DB's point of view. It looks like there is a stray entry in there
> that PVFS doesn't understand, but it shouldn't interfere with anything.
> You will just see that warning when you start the server.
>
>
> Is it ready to
>> go back into production (once I update versions of db and pvfs2)?
>>
>
> I would think so. You mentioned originally that some users were seeing
> some "weirdness", so maybe you can someone to check whatever data they were
> working with before to see if it looks ok.
>
> -Phil
>
>
>> --Jim
>>
>>
>> On Wed, Apr 4, 2012 at 1:18 PM, Elaine Quarles<elaine at omnibond.com>
>> wrote:
>>
>>> Try "make develtools".
>>>
>>> -- Elaine
>>>
>>> -----Original Message-----
>>> From: Jim Kusznir [mailto:jkusznir at gmail.com]
>>> Sent: Wednesday, April 04, 2012 3:45 PM
>>> To: Elaine Quarles
>>> Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors
>>> detected
>>>
>>> I patched everything and ran configure and make, but it didn't build
>>> pvfs2-db-display. The .c file is present. I haven't found the magic
>>> make
>>> command to cause that to be built either...Suggestions?
>>>
>>> --Jim
>>>
>>> On Wed, Apr 4, 2012 at 11:35 AM, Elaine Quarles<elaine at omnibond.com>
>>> wrote:
>>>
>>>> Sorry for the delay. Attached are db-display.tar. If you expand this
>>>> from the top level directory of your source tree it will create the
>>>> src/apps/devel directory. Makefile.in.patch will patch your
>>>> Makefile.in with the logic necessary to build pvfs2-db-display. Please
>>>> note that it is necessary to run the configure script to update your
>>>>
>>> Makefile.
>>>
>>>> Please send the results of running this utility so we can determine
>>>> whether it is necessary to try continuous forward reading through the
>>>> database, skipping error records or whether we will have to also read
>>>> from the end of the database backwards.
>>>>
>>>> Thanks,
>>>> Elaine
>>>>
>>>> -----Original Message-----
>>>> From: Jim Kusznir [mailto:jkusznir at gmail.com]
>>>> Sent: Wednesday, April 04, 2012 1:56 PM
>>>> To: Elaine Quarles
>>>> Cc: Becky Ligon
>>>> Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors
>>>> detected
>>>>
>>>> Any updates? My entire cluster is still offline due to this problem,
>>>> and my users are starting to look for their pitchforks....
>>>>
>>>> Thanks!
>>>> --Jim
>>>>
>>>> On Tue, Apr 3, 2012 at 8:47 AM, Elaine Quarles<elaine at omnibond.com>
>>>>
>>> wrote:
>>>
>>>> Jim,
>>>>>
>>>>> Could you please check whether your pvfs 2.8.2 distribution contains
>>>>> src/apps/devel/pvfs2-db-**display.c? If so you can build it by running
>>>>> "make develtools". If your distribution does not contain this file
>>>>> let me know and I will send a patch.
>>>>>
>>>>> If you already have the utility, please redirect the output and send
>>>>> it so we can see what it has to say about the state of the database
>>>>> and determine the next step from there.
>>>>>
>>>>> Here is the command-line format.
>>>>> Usage: ./pvfs2-db-display --dbpath<path> --hexdir<hexdir>
>>>>> Example: ./pvfs2-db-display --dbpath /tmp/pvfs2-space --hexdir
>>>>> 4e3f77a5
>>>>>
>>>>> Options:
>>>>> --verbose Enable verbose output
>>>>> --help This message.
>>>>> --dbpath<path> The path of the server's StorageSpace.
>>>>> The path
>>>>> should contain collections.db and
>>>>> storage_attributes.db
>>>>> --hexdir<dir> The directory in dbpath that contains
>>>>> collection_attributes.db,
>>>>> dataspace_attrbutes.db
>>>>> and keyval.db
>>>>>
>>>>> Thanks,
>>>>> Elaine
>>>>>
>>>>> -----Original Message-----
>>>>> From: Jim Kusznir [mailto:jkusznir at gmail.com]
>>>>> Sent: Monday, April 02, 2012 5:57 PM
>>>>> To: ligon at clemson.edu
>>>>> Cc: ligon at omnibond.com; ofs-support.com at clemson.edu;
>>>>> elaine at clemson.edu
>>>>> Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors
>>>>> detected
>>>>>
>>>>> If this is the recommended method for recovery, then lets do it.
>>>>>
>>>>> Just one more question on how pvfs2 runs: is the metadata contained
>>>>> on each server different, or should they all be identical copies? It
>>>>> just occurred to me that my understanding of the metadata was that
>>>>> all three metadata servers were redundant..... Or is this a
>>>>> "different
>>>>>
>>>> metadata" db?
>>>>
>>>>> --Jim
>>>>>
>>>>> On Mon, Apr 2, 2012 at 1:15 PM, Becky Ligon<ligon at clemson.edu> wrote:
>>>>>
>>>>>> Jim:
>>>>>>
>>>>>> We have a program called pvfs2-db-display that reads directly
>>>>>> through the Berkeley DB. We don't know for sure, but we might be
>>>>>> able to use whatever information it will give to recover what we
>>>>>> can. The program reads from the database from logical top to
>>>>>> bottom. We can also change it to read from logical bottom to top.
>>>>>> In this way, we MAY be able to recover the good data that is still
>>>>>> there above and below the corrupted area. We've never done this but
>>>>>> we are willing to give it a
>>>>>>
>>>>> try.
>>>>>
>>>>>> Let us know if you'd like to try this!
>>>>>>
>>>>>> Becky
>>>>>> --
>>>>>> Becky Ligon
>>>>>> HPC Admin Staff
>>>>>> PVFS/OrangeFS Developer
>>>>>> Clemson University/Omnibond.com OrangeFS Support
>>>>>> 864-650-4065
>>>>>>
>>>>>> Your solution sounds like what I am trying to do; I'd prefer to
>>>>>>> install db4 into /opt.
>>>>>>>
>>>>>>> If I can get your spec file or srpm, I'd greatly appreciate it!
>>>>>>>
>>>>>>> --Jim
>>>>>>>
>>>>>>> On Mon, Apr 2, 2012 at 11:19 AM, Becky Ligon<ligon at omnibond.com>
>>>>>>>
>>>>>> wrote:
>>>
>>>> Jim:
>>>>>>>>
>>>>>>>> We downloaded the software from the Oracle site and created an rpm
>>>>>>>> from that. We are running Centos5 on our productions servers with
>>>>>>>> kernel=2.6.18-238.9.1.el5 and have been running a version of db4
>>>>>>>> for at least the past 3 years. So, you should be able to create
>>>>>>>> the rpm. I can send you the rpm that we are using but it is
>>>>>>>> taylored to our environment; we install db4 in /opt/db4, because
>>>>>>>> other items depend on the installed version.
>>>>>>>>
>>>>>>>> Becky
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Apr 2, 2012 at 1:37 PM, Jim Kusznir<jkusznir at gmail.com>
>>>>>>>>
>>>>>>> wrote:
>>>
>>>> I've been trying to build a db4 rpm on my centos box, but it
>>>>>>>>> appears it has dependencies that require an OS upgrade...how did
>>>>>>>>> you get anything newer than the stock db4 installed on centos5?
>>>>>>>>>
>>>>>>>>> --Jim
>>>>>>>>>
>>>>>>>>> On Sat, Mar 31, 2012 at 3:07 PM, Becky Ligon<ligon at omnibond.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Jim:
>>>>>>>>>>
>>>>>>>>>> I understand your situation. Here at Clemson University, we
>>>>>>>>>> went through the same situation a couple of years ago. Now, we
>>>>>>>>>> backup the
>>>>>>>>>>
>>>>>>>>> metadata
>>>>>>>>>
>>>>>>>>>> databases. We don't have the space to backup our data either!
>>>>>>>>>>
>>>>>>>>>> Under no circumstances should you run pvfs2-fsck. If you do,
>>>>>>>>>> then we won't be able to help at all, if you run this command
>>>>>>>>>> in the destructive
>>>>>>>>>>
>>>>>>>>> mode.
>>>>>>>>>
>>>>>>>>>> If
>>>>>>>>>> you're willing, Omnibond MAY be able to write some utilities
>>>>>>>>>> that we help you recover most of the data. You will have to
>>>>>>>>>> speak to Boyd Wilson
>>>>>>>>>> (boyd.wilson at omnibond.com) and workout something.
>>>>>>>>>>
>>>>>>>>>> Becky Ligon
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 30, 2012 at 5:55 PM, Jim Kusznir
>>>>>>>>>> <jkusznir at gmail.com>
>>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I made no changes to my environment; it was up and running
>>>>>>>>>>> just
>>>>>>>>>>>
>>>>>>>>>> fine.
>>>>>>>>>
>>>>>>>>>> I ran db_recover, and it immediately returned, with no
>>>>>>>>>>> apparent sign of doing anything but creating a log.000000001
>>>>>>>>>>> file.
>>>>>>>>>>>
>>>>>>>>>>> I have the centos DB installed, db4-4.3.29-10.el5
>>>>>>>>>>>
>>>>>>>>>>> I have no backups; this is my high performance filesystem of
>>>>>>>>>>> 99TB;
>>>>>>>>>>>
>>>>>>>>>> it
>>>>>>>>>
>>>>>>>>>> is the largest disk we have and therefore have no means of
>>>>>>>>>>> backing
>>>>>>>>>>>
>>>>>>>>>> it
>>>>>>>>>
>>>>>>>>>> up. We don't have anything big enough to hold that much data.
>>>>>>>>>>>
>>>>>>>>>>> Is there any hope? Can we just identify and delete the files
>>>>>>>>>>> that have the db dammange on it? (Note that I don't even have
>>>>>>>>>>> anywhere
>>>>>>>>>>>
>>>>>>>>>> to
>>>>>>>>>
>>>>>>>>>> back up this data to temporally if we do get it running, so
>>>>>>>>>>> I'd need to "fix in place".
>>>>>>>>>>>
>>>>>>>>>>> thanks!
>>>>>>>>>>> --Jim
>>>>>>>>>>>
>>>>>>>>>>> --Jim
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Mar 30, 2012 at 2:44 PM, Becky Ligon
>>>>>>>>>>> <ligon at omnibond.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Jim:
>>>>>>>>>>>>
>>>>>>>>>>>> If you haven't made any recent changes to your pvfs
>>>>>>>>>>>> environment or Berkeley Db installation, then it looks like
>>>>>>>>>>>> you have a corrupted metadata database.
>>>>>>>>>>>> There is no way to easily recover. Sometimes, the Berkeley
>>>>>>>>>>>> db command "db_recover" might work, but PVFS doesn't have
>>>>>>>>>>>> transactions turned on, so normally it doesn't work. It's
>>>>>>>>>>>> worth a try, just to be sure.
>>>>>>>>>>>>
>>>>>>>>>>>> Do you have any recent backups of the databases? If so,
>>>>>>>>>>>> then you will need to use a set of backups that were created
>>>>>>>>>>>> around the same time, so the databases will be somewhat
>>>>>>>>>>>> consistent with each other.
>>>>>>>>>>>>
>>>>>>>>>>>> Which version of Berkeley are you using? We have had
>>>>>>>>>>>> corruption issues with older versions of it. We strongly
>>>>>>>>>>>> recommend 4.8 or higher. There are some know problems with
>>>>>>>>>>>> threads in the older versions .
>>>>>>>>>>>>
>>>>>>>>>>>> Becky Ligon
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Mar 30, 2012 at 3:28 PM, Jim Kusznir
>>>>>>>>>>>> <jkusznir at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I got some notices from my users with "wierdness with pvfs2"
>>>>>>>>>>>>> this morning, and went and investagated. eventually, I
>>>>>>>>>>>>> found the following on one of my 3 serers:
>>>>>>>>>>>>>
>>>>>>>>>>>>> [S 03/30 12:22] PVFS2 Server on node pvfs2-io-0-2 version
>>>>>>>>>>>>> 2.8.2 starting...
>>>>>>>>>>>>> [E 03/30 12:23] Warning: got invalid handle or key size in
>>>>>>>>>>>>> dbpf_dspace_iterate_handles().
>>>>>>>>>>>>> [E 03/30 12:23] Warning: skipping entry.
>>>>>>>>>>>>> [E 03/30 12:23] c_get failed on iteration 3044 [E 03/30
>>>>>>>>>>>>> 12:23] dbpf_dspace_iterate_handles_**op_svc: Invalid
>>>>>>>>>>>>>
>>>>>>>>>>>> argument
>>>>>>>>>
>>>>>>>>>> [E 03/30 12:23] Error adding handle range
>>>>>>>>>>>>> 1431655768-2147483649,**3579139414-4294967295 to filesystem
>>>>>>>>>>>>>
>>>>>>>>>>>> pvfs2-fs
>>>>>>>>>
>>>>>>>>>> [E 03/30 12:23] Error: Could not initialize server
>>>>>>>>>>>>> interfaces; aborting.
>>>>>>>>>>>>> [E 03/30 12:23] Error: Could not initialize server; aborting.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ------------
>>>>>>>>>>>>> pvfs2-fs.conf:
>>>>>>>>>>>>> -----------
>>>>>>>>>>>>>
>>>>>>>>>>>>> <Defaults>
>>>>>>>>>>>>> UnexpectedRequests 50
>>>>>>>>>>>>> EventLogging none
>>>>>>>>>>>>> LogStamp datetime
>>>>>>>>>>>>> BMIModules bmi_tcp
>>>>>>>>>>>>> FlowModules flowproto_multiqueue
>>>>>>>>>>>>> PerfUpdateInterval 1000
>>>>>>>>>>>>> ServerJobBMITimeoutSecs 30
>>>>>>>>>>>>> ServerJobFlowTimeoutSecs 30
>>>>>>>>>>>>> ClientJobBMITimeoutSecs 300
>>>>>>>>>>>>> ClientJobFlowTimeoutSecs 300
>>>>>>>>>>>>> ClientRetryLimit 5
>>>>>>>>>>>>> ClientRetryDelayMilliSecs 2000
>>>>>>>>>>>>> StorageSpace /mnt/pvfs2
>>>>>>>>>>>>> LogFile /var/log/pvfs2-server.log</**Defaults>
>>>>>>>>>>>>>
>>>>>>>>>>>>> <Aliases>
>>>>>>>>>>>>> Alias pvfs2-io-0-0 tcp://pvfs2-io-0-0:3334
>>>>>>>>>>>>> Alias pvfs2-io-0-1 tcp://pvfs2-io-0-1:3334
>>>>>>>>>>>>> Alias pvfs2-io-0-2 tcp://pvfs2-io-0-2:3334
>>>>>>>>>>>>> </Aliases>
>>>>>>>>>>>>>
>>>>>>>>>>>>> <Filesystem>
>>>>>>>>>>>>> Name pvfs2-fs
>>>>>>>>>>>>> ID 62659950
>>>>>>>>>>>>> RootHandle 1048576
>>>>>>>>>>>>> <MetaHandleRanges>
>>>>>>>>>>>>> Range pvfs2-io-0-0 4-715827885
>>>>>>>>>>>>> Range pvfs2-io-0-1 715827886-1431655767
>>>>>>>>>>>>> Range pvfs2-io-0-2 1431655768-2147483649
>>>>>>>>>>>>> </MetaHandleRanges>
>>>>>>>>>>>>> <DataHandleRanges>
>>>>>>>>>>>>> Range pvfs2-io-0-0 2147483650-2863311531
>>>>>>>>>>>>> Range pvfs2-io-0-1 2863311532-3579139413
>>>>>>>>>>>>> Range pvfs2-io-0-2 3579139414-4294967295
>>>>>>>>>>>>> </DataHandleRanges>
>>>>>>>>>>>>> <StorageHints>
>>>>>>>>>>>>> TroveSyncMeta yes
>>>>>>>>>>>>> TroveSyncData no
>>>>>>>>>>>>> </StorageHints>
>>>>>>>>>>>>> </Filesystem>
>>>>>>>>>>>>> -------------
>>>>>>>>>>>>> Any suggestions for recovery?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>> --Jim
>>>>>>>>>>>>> ______________________________**_________________
>>>>>>>>>>>>> Pvfs2-users mailing list
>>>>>>>>>>>>> Pvfs2-users at beowulf-**underground.org<Pvfs2-users at beowulf-underground.org>
>>>>>>>>>>>>> http://www.beowulf-**underground.org/mailman/**
>>>>>>>>>>>>> listinfo/pvfs2-u<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-u>
>>>>>>>>>>>>> s
>>>>>>>>>>>>> e
>>>>>>>>>>>>> rs
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Becky Ligon
>>>>>>>>>>>> OrangeFS Support and Development Omnibond Systems Anderson,
>>>>>>>>>>>> South Carolina
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Becky Ligon
>>>>>>>>>> OrangeFS Support and Development Omnibond Systems Anderson,
>>>>>>>>>> South Carolina
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Becky Ligon
>>>>>>>> OrangeFS Support and Development
>>>>>>>> Omnibond Systems
>>>>>>>> Anderson, South Carolina
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> ______________________________**_________________
>> Pvfs2-users mailing list
>> Pvfs2-users at beowulf-**underground.org<Pvfs2-users at beowulf-underground.org>
>> http://www.beowulf-**underground.org/mailman/**listinfo/pvfs2-users<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users>
>>
>
> ______________________________**_________________
> Pvfs2-users mailing list
> Pvfs2-users at beowulf-**underground.org<Pvfs2-users at beowulf-underground.org>
> http://www.beowulf-**underground.org/mailman/**listinfo/pvfs2-users<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users>
>
--
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.beowulf-underground.org/pipermail/pvfs2-users/attachments/20120405/1056b7a0/attachment-0001.htm
More information about the Pvfs2-users
mailing list