[Pvfs2-users] filesystem errors - pvfs2-fsck failing
Michael Moore
mtmoore at clemson.edu
Mon Jan 17 10:07:27 EST 2011
I'm glad that cleaned up the problem. From what I saw the statfs calls
use handle counts that are maintained in memory. On a restart of the
servers the handle lists in memory are regenerated using the same calls
the iterate-handles management calls use.
I don't have an answer for how the in-memory handle counts got out of
sync.
If you run into other fsck issues feel free to ping the list.
Thanks,
Michael
On Mon, Jan 17, 2011 at 09:24:48AM -0500, Bill Wichser wrote:
> The reboot of the I/O servers have now cleared up this failure. There
> is still some inconsistency
> in the filesystem while running pvfs2-fsck but I believe that I can
> clean these up.
>
> Thanks,
> Bill
>
> Michael Moore wrote:
> > Hi Bill,
> >
> > Sorry for the delay. The difference appears to be between what the
> > management iterate handles call returns and what statfs returns to fsck.
> > I'm looking now to get a better understanding how statfs and the trove
> > ledger stuff gets it's counts versus how iterate handles counts them.
> >
> > In the mean time, have the server processes been restarted since this
> > behavior started occurring? If not, is that a possibility?
> >
> > Sorry again for the delay in getting back with you on this issue.
> >
> > Thanks,
> > Michael
> >
> > On Tue, Jan 04, 2011 at 01:48:20PM -0500, Bill Wichser wrote:
> >
> >> I've deleted those files with the native pvfs2-rm command which informed
> >> me to run pvfs2-fsck. Running pvfs2-validate turned up a number more
> >> which I removed. So there is nothing to pvfs2-viewdist on.
> >>
> >> FWIW I'm running a meta on the head and the I/O servers on 16 compute
> >> nodes, version 2.8.2
> >>
> >> [root at della3 bill]# pvfs2-stat /scratch/pvfs2
> >> -------------------------------------------------------
> >> File Name : /scratch/pvfs2
> >> Relative Name : /
> >> fs ID : 1922795883
> >> Handle : 1048576
> >> Mask : 504000177
> >> Permissions : 777
> >> Type : Directory
> >> Size : 4096
> >> Owner : 0 (root)
> >> Group : 0 (root)
> >> atime : 1294130281 (Tue Jan 4 03:38:01 2011)
> >> mtime : 1293499466 (Mon Dec 27 20:24:26 2010)
> >> ctime : 1293499462 (Mon Dec 27 20:24:22 2010)
> >> dir entries : 6
> >>
> >> [root at della3 bill]# pvfs2-validate -d /scratch/pvfs2/
> >> pvfs2-validate starting validation at object [/scratch/pvfs2]
> >> pvfs2-validate done validating object tree at [/scratch/pvfs2]
> >>
> >> [root at della3 bill]# pvfs2-fsck -p -m /scratch/pvfs2
> >> # Current FSID is 1922795883.
> >> Ugh! Server 1, Received 64789 total handles instead of 64792
> >>
> >> So the total handles have changed, as expected because of the removals,
> >> but the difference is the same. Now to be honest, when I made that
> >> filesystem, I didn't run an fsck so it could be a remnant from last
> >> month. I don't know. But we have a bunch of Genomics people wrecking
> >> havoc with those strange files in kernel space. I was able to do an
> >> pvfs2-ls on them (user space) but didn't really pursue, hoping instead
> >> to just make the problem go away!
> >>
> >> Thanks,
> >> Bill
> >>
> >> Michael Moore wrote:
> >>
> >>> Hi Bill,
> >>>
> >>> Can you provide the output of pvfs2-stat on the parent directory
> >>> and affected files and 'pvfs2-viewdist -f <path>' on the affected files?
> >>>
> >>> Do you see any complaints in the server logs related to accessing these
> >>> files?
> >>>
> >>> Michael
> >>>
> >>> On Mon, Jan 03, 2011 at 08:04:02AM -0500, Bill Wichser wrote:
> >>>
> >>>
> >>>> Having some trouble with my filesystem. There are a few files which did
> >>>> not get written correctly by one of the users and some corruption looks
> >>>> to be present.
> >>>>
> >>>> # ls -lR
> >>>> ./3689_old:
> >>>> total 0
> >>>> ?--------- ? ? ? ? ? clusmax.out
> >>>>
> >>>> ./3764_old:
> >>>> total 0
> >>>> ?--------- ? ? ? ? ? traj.xtc
> >>>>
> >>>> These cannot be removed. In the past, a run of pvfs2-fsck seemed to
> >>>> correct these types of problems but this time all I get is the following
> >>>> message and the fsck terminates. I'm not sure how to correct this.
> >>>> Googling leads me to the source code. Anyone have any suggestions?
> >>>>
> >>>> # pvfs2-fsck -p -v -m /scratch/pvfs2
> >>>> # Current FSID is 1922795883.
> >>>> Ugh! Server 1, Received 64796 total handles instead of 64800
> >>>>
> >>>>
> >>>> Thanks, and Happy New Year to all!
> >>>> Bill
> >>>>
> >>>> _______________________________________________
> >>>> Pvfs2-users mailing list
> >>>> Pvfs2-users at beowulf-underground.org
> >>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
> >>>>
> >>>>
> > _______________________________________________
> > Pvfs2-users mailing list
> > Pvfs2-users at beowulf-underground.org
> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
> >
More information about the Pvfs2-users
mailing list