[PVFS-developers] another 1.6.3pre1 bug

Porter Don PorterDE at mercury.hendrix.edu
Fri Apr 30 19:25:27 EDT 2004


Ok.  I think I have this one patched up now.  I don't completely understand
the fundamental problem, and there could be something else wrong even deeper
that we just haven't caught yet...

The mtime thing wasn't the problem at all, but I went ahead and patched that
file too.

Basically, in (k)pvfs_v1_xfer.c: do_jobs(), it seems that for some reason
there are checks for short reads.  For a while (before 1.6.0), this checking
was restricted to read ops in the non-kernel version.  This checking was
done for reads and writes in the kernel version.  

This was all well and good until I put in the fix for the rename bug I was
experiencing with the library.  The kernel doesn't track fs_ino's and just
always sticks a zero in there.  The fstat call actually checks the fs_ino.
Because no fs_ino zero existed on the manager, the call would fail.
Coincidentally, this call happened to only be used in the kernel code to
check for short reads.  Somehow this call would be rejected and by the time
it got back to 'cat' it thought there wasn't enough space on the device.

This had always worked by coincidence before because the fs_ino was always
zero in all files created by the kernel.

Solution: I put the same restrictions on checking in kpvfs_v1_xfer.c that
are in pvfs_v1_xfer.c and things seemed to start working right again.

It is my guess, though, that there are at least two problems waiting on us.
1) The check for a short read will probably fail with ENOSPC.  Maybe this is
ok.  2) Any other requests to the manager that check fs_ino will fail unless
we find a place in the kernel to keep fs_ino and make it do the same thing
as the library.  I really wanted to get around doing this, and maybe still
can, but it could break other things.  Thoughts?

I know that is a bit hard to follow, but I don't completely understand how
the kernel code works.  I can try to explain in more detail if needed.

Thanks,
Don

-----Original Message-----
From: Rob Ross
To: Porter Don
Cc: ''pvfs-developers at www.beowulf-underground.org' '
Sent: 4/30/04 3:57 PM
Subject: RE: [PVFS-developers] another 1.6.3pre1 bug

Yes, it was my mistake to allow the two versions to be so different in
the 
first place.  Thanks for tracking this down.  Sorry that it is a
problem.

Rob

On Fri, 30 Apr 2004, Porter Don wrote:

> I tracked this one down a bit further.  It only happens when one uses
> kpvfsd.  The user-space daemon seems to work fine.  The only changes
that
> have happened were the mtime patch, which was only applied to the
pvfsd
> code, not the kpvfsd code.  Thus, the kpvfsd is no longer speaking the
same
> protocol as the other pieces.
> 
> This is not the first time we have had this problem with the two code
bases.
> If we weren't trying to move away from pvfs1, it would almost be worth
the
> trouble to consolidate some of that code with some judicious
#define's.  
> 
> In the meantime, we all need to keep an eye out for this when making
> changes.
> 
> Hopefully I can figure out what else has changed and get a patch out
on the
> list.
> 
> Thanks,
> don
> 
> -----Original Message-----
> From: Porter  Don
> To: 'pvfs-developers at www.beowulf-underground.org'
> Sent: 4/30/04 12:21 PM
> Subject: [PVFS-developers] another 1.6.3pre1 bug
> 
> It seems that the 1.6.3pre1 has started getting incorrect ENOSPC's in
> the
> kernel module.
> 
> [root at jawa063 floppy]# ls -l
> total 83968006
> -rw-r--r--    1 root     root          156 Apr 30 12:17 a
> drwxr-xr-x    1 dmethe   dmethe       4096 Apr 30 10:06 foo/
> -rw-r--r--    1 root     root           78 Apr 30 11:40 i
> -rw-r--r--    1 doport   doport        468 Apr 30 12:16 interactive
> -rwxrwxr-x    1 doport   doport   42991616000 Apr 23 10:32 testfile*
> -rw-r--r--    1 doport   doport   42991616000 Apr 27 14:58 testout
> [root at jawa063 floppy]# cat i >> a
> cat: write error: No space left on device
> [root at jawa063 floppy]# ls -l
> total 83968006
> -rw-r--r--    1 root     root          234 Apr 30 12:17 a
> drwxr-xr-x    1 dmethe   dmethe       4096 Apr 30 10:06 foo/
> -rw-r--r--    1 root     root           78 Apr 30 11:40 i
> -rw-r--r--    1 doport   doport        468 Apr 30 12:16 interactive
> -rwxrwxr-x    1 doport   doport   42991616000 Apr 23 10:32 testfile*
> -rw-r--r--    1 doport   doport   42991616000 Apr 27 14:58 testout
> 
> =================================
> 
> The cat seems to work, so perhaps the bug is in close...  I can't
> replicate
> the bug in 1.6.2 or my patched version of 1.6.0.  My guess is that I
> screwed
> something up in dropping fsize, but I am still looking.  Any
> thoughts/suggestions are welcome.
> 
> Thanks,
> don
> _______________________________________________
> PVFS-developers mailing list
> PVFS-developers at www.beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs-developers
> _______________________________________________
> PVFS-developers mailing list
> PVFS-developers at www.beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs-developers
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: kernel-req-fix-1.6.3.patch
Type: application/octet-stream
Size: 3124 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs-developers/attachments/20040430/0da9748e/kernel-req-fix-1.6.3.obj


More information about the PVFS-developers mailing list