[Pvfs2-developers] Bad File Entries
Sam Lang
slang at mcs.anl.gov
Tue Jun 29 14:40:55 EDT 2010
Hi Bart,
I haven't had a chance to do a whole lot of debugging of this problem, although I did run your test once. I was using XFS as the underlying filesystem, and after running successfully for a while, the XFS mount just kind of blew up on me, and started returning EIO errors for everything. I will try the test again with ext2 (is that what you're using?), but in the meantime, I wanted to address the actual problem as you described, which is that you can't remove those bad file entries easily. I've attached a patch which I think will fix the issue, allowing you to remove those bad entries with either rm or pvfs2-rm. I haven't been able to test this fully, but its only a one-liner, so there shouldn't be any unintended side-affects. Do you want to give it a try?
-sam
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-remove-enoent.patch
Type: application/octet-stream
Size: 726 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20100629/95998d68/fix-remove-enoent.obj
-------------- next part --------------
On Jun 18, 2010, at 3:21 PM, Bart Taylor wrote:
> Yes. I reran the test again so that I could grab the actual messages, and this time it was a bit more aggressive. Instead of leaving a file in a bad state, it left my whole directory structure under the file system root in that state. I attached a chunk of log messages from the client and from the server that was timing out. The other servers did not log anything.
>
> I am currently getting this back from pvfs2-fsck:
>
> server 1, exceeding number of handles it declared (42923), currently (43000)
> pvfs28-fsck: ../pvfs2_src/src/apps/admin/pvfs2-fsck.c:1325: handlelist_add_handles: Assertion `0' failed.
> Aborted
>
>
> Bart.
>
>
>
> On Fri, Jun 18, 2010 at 8:15 AM, Sam Lang <slang at mcs.anl.gov> wrote:
>
> Hi Bart,
>
> When you run the script, do you see any timeout error messages in the client log?
>
> -sam
>
> On Jun 18, 2010, at 9:03 AM, Bart Taylor wrote:
>
> > Hey Phil,
> >
> > Yes, it is running 2.8.2. My setup was using 3 servers with 2.6.18-194.el5 kernels and High Availability. I have not had a chance yet to try it on another file system, so I do not know if it is specific to that setup. It has been triggered from more than one client, but the only know I know for certain was running a 2.6.9-89.ELsmp kernel.
> >
> > Bart.
> >
> >
> > On Fri, Jun 18, 2010 at 7:39 AM, Phil Carns <carns at mcs.anl.gov> wrote:
> > Hi Bart,
> >
> > Is this on 2.8.2? Do you happen to know how many servers are needed to trigger the problem?
> >
> > thanks,
> > -Phil
> >
> >
> > On 06/17/2010 04:08 PM, Bart Taylor wrote:
> >>
> >> Hey guys,
> >>
> >> We have had some problems in the past on 2.6 with file creations leaving bad
> >> files that we cannot delete. Most utilities like ls and rm return "No such file
> >> or directory", and pvfs utilities like viewdist, pvfs2-ls, and pvfs2-rm return
> >> various errors. We have resorted to looking up the parent handle, the fsid, and
> >> filename and using pvfs2-remove-object to delete the entry. But we weren't ever
> >> able to intentionally recreate the problem.
> >>
> >> Recently while testing 2.8, I have been able to reliably trigger a similar
> >> scenario where a file creation fails and leaves a garbage entry that cannot be
> >> deleted in any of the normal ways requiring the pvfs2-remove-object approach to
> >> clean up. The file and various outputs for this case:
> >>
> >> [root at client dir]# ls -l 2010.06.10.28050
> >> total 0
> >> ?--------- ? ? ? ? ? File17027
> >>
> >> [root at client dir]# rm 2010.06.10.28050/File17027
> >> rm: cannot lstat `2010.06.10.28050/File17027': No such file or directory
> >>
> >> [root at client dir]# rm -rf 2010.06.10.28050
> >> rm: cannot remove directory `2010.06.10.28050': Directory not empty
> >>
> >> [root at client dir]# pvfs2-rm 2010.06.10.28050/File17027
> >> Error: An error occurred while removing 2010.06.10.28050/File17027
> >> PVFS_sys_remove: No such file or directory (error class: 0)
> >>
> >> [root at client dir]# pvfs2-stat 2010.06.10.28050/File17027
> >> PVFS_sys_lookup: No such file or directory (error class: 0)
> >> Error stating [2010.06.10.28050/File17027]
> >>
> >> [root at client dir]# pvfs2-viewdist -f 2010.06.10.28050/File17027
> >> PVFS_sys_lookup: No such file or directory (error class: 0)
> >> Could not open 2010.06.10.28050/File17027
> >>
> >> [root at client dir]# ls -l 2010.06.10.28050
> >> total 0
> >> ?--------- ? ? ? ? ? File17027
> >>
> >>
> >> I have included a test script that will spawn off a number of processes, open a
> >> bunch of files, write to each of them, then close them. You can tweak the
> >> options as you want but using 5 processes and 50,000 files will usually create
> >> at least one of these files. Here is an example command:
> >>
> >> $> ulimit -n 1000000 && ./open-file-limit --num-files=50000 --sleep-time=1 --num-processes=5 --directory=/mnt/pvfs2/ --file-size=1
> >>
> >> You may have to do a long listing on any left-over directories to find the file(s).
> >>
> >> I will give any help I can to help recreate the bad file or find the cause.
> >> Until then, is there a better (simpler) way to remove these entries, maybe
> >> some sort of utility that doesn't require doing manual handle lookups before
> >> getting the file removed? It would ease some support pain if it were simpler to
> >> fix.
> >>
> >> Thanks for your help,
> >> Bart.
> >>
> >> _______________________________________________
> >> Pvfs2-developers mailing list
> >>
> >> Pvfs2-developers at beowulf-underground.org
> >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> >>
> >>
> >>
> >
> >
> > _______________________________________________
> > Pvfs2-developers mailing list
> > Pvfs2-developers at beowulf-underground.org
> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> >
> >
> > _______________________________________________
> > Pvfs2-developers mailing list
> > Pvfs2-developers at beowulf-underground.org
> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
>
> <server2-log.TXT><client-log.TXT>
More information about the Pvfs2-developers
mailing list