[PVFS-users] crash when no place left

Rob Ross rross at mcs.anl.gov
Fri Jun 25 12:39:48 EDT 2004


Hi,

I've attached a patch that (in my configuration anyway) fixes the problem.  
Basically I've added a new parameter to a function so that we can keep 
track of data read from the socket that didn't make it into the file.

Please let me know if this helps,

Rob

On Wed, 19 May 2004, Rob Ross wrote:

> Hi,
> 
> Using the kpvfsd I find that the cp command doesn't seem to return, 
> although I was able to ctrl-c the cp.  The mgr and iod are still up and 
> running ok.  I was able, sort of, to unmount, but I cannot mount again; 
> the mount.pvfs hangs waiting for soemthing or another.
> 
> That "md_stat: lstat: ..." message is normal, and I just took it out so we 
> won't have to look at it in the future.
> 
> I'm going to try again with the user-space pvfsd and see if I can narrow 
> down what is going on.  It appears that the iod and mgr are handling this 
> just fine, and that there is simply a problem on the client side.
> 
> We're trying to get another prerelease out ASAP, so a fix for this may not 
> make that cut.  I will try to get this fixed before the 1.6.3 release.
> 
> Thanks,
> 
> Rob
> 
> On Tue, 18 May 2004, Rob Ross wrote:
> 
> > Thanks for the problem report!  It is normal for the mgr to not log 
> > anything in this case -- the mgr is not involved in write operations.
> > 
> > I'll see if I can replicate this here.
> >
> > On Tue, 18 May 2004 Stadrim.DRIM.CETMEF at i-carre.net wrote:
> > 
> > > I have 10 nodes running the iod server with 60 Go on each and one
> > > other running the mgr server.
> > > 
> > > On a client (with pvfsd and the kernel module), if I try to copy a
> > > file when all the servers are full, the cp command never returns. It is
> > > impossible to kill this command (SIGKILL doesn't work), but killing
> > > the mgr process works some time, in the worth case, I have to reboot
> > > the client. I think that the kernel module waits for something that
> > > never happens.
> > > 
> > > In the log files, I found that all the iod servers notice that there is
> > > no place left on their local partition, but not the mgr or the pvfsd
> > > on the client.
> > > 
> > > In the mgr log file, there is a lot of "md_stat: lstat:: No such file
> > > or directory", nothing in the client's log.
> > > 
> > > Pvfs run on a Mandrake 9.0 with a custom kernel 2.4.26 and pvfs 1.6.2
> 
> _______________________________________________
> PVFS-users mailing list
> PVFS-users at www.beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs-users
> 
> 
-------------- next part --------------
Index: iod/jobs.c
===================================================================
RCS file: /projects/cvsroot/pvfs/iod/jobs.c,v
retrieving revision 1.46
diff -r1.46 jobs.c
49c49
< 								int fd, int8_t *);
---
> 								int fd, int8_t *, int64_t *);
436c436
< 	int64_t comp;
---
> 	int64_t comp, lost_data;
526c526
< 						a_p->u.rw.file_p->fd, nospc);
---
> 						a_p->u.rw.file_p->fd, nospc, &lost_data);
534c534,535
< 			if (!(a_p->u.rw.size -= comp)) /* done with access */ {
---
> 			if (!(a_p->u.rw.size -= (comp + lost_data))) /* done with access */
> 			{
552c553
< 			a_p->u.rw.off  += comp;
---
> 			a_p->u.rw.off  += comp + lost_data;
922,923c923,929
< static int64_t do_write(ainfo_p a_p, int sock, int64_t loc, int64_t size, int fd,
< 	int8_t *nospc)
---
> static int64_t do_write(ainfo_p a_p,
> 								int sock,
> 								int64_t loc,
> 								int64_t size,
> 								int fd,
> 								int8_t *nospc,
> 								int64_t *lost_data)
928a935,936
> 	*lost_data = 0;
> 
1004a1013,1019
> 			   if (comp > 0) {
> 					*lost_data = a_p->u.rw.wb_count - comp;
> 					total += comp;
> 				}
> 				else {
> 					*lost_data = wsize;
> 				}


More information about the PVFS-users mailing list