[PVFS-users] random ll_pvfs_file_write ...downcall

Rob Ross rross at mcs.anl.gov
Fri Feb 13 01:18:31 EST 2004


Hi,

What kind of processors do you have?  Have you applied the recent patch 
posted to pvfs-users?

Thanks,

Rob

On Fri, 13 Feb 2004, Kent F. Milfeld wrote:

> Hi,
> 
>  
> 
>   We just recently installed 1.6.2 (after successfully running 1.5.x).
> 
>   When I run 16-processor mpi-io jobs, the IO will sometimes fail
> 
>   with the following error information in the code (usually /mnt/pvfs
> 
>   will become unmounted):
> 
>  
> 
> ...
> 
>  rank=           9  CLOSE IOERR=           0
> 
>  rank=          10  WRITE IOERR=           0    host=compute-9-23
> 
>  
> 
>  rank=          10  CLOSE IOERR=           0
> 
>  rank=           8  WRITE IOERR=           0    host=compute-9-30
> 
>  
> 
>  rank=           8  CLOSE IOERR=           0
> 
>  rank=           6  WRITE IOERR=        8288    host=compute-1-7
> 
> ...
> 
>  
> 
>   In the /var/log/kern on compute-1-7 I found the following information:
> 
>  
> 
>  
> 
>  
> 
> ************************************************************************
> ********
> 
>  
> 
> Two runs,  one about 00:09:30 and the other about ~00:17.
> 
> compute-1-11
> 
> Feb 13 00:09:30 compute-1-11 kernel: (ll_pvfs.c, 665):
> ll_pvfs_file_write got error in downcall
> 
> Feb 13 00:14:42 compute-1-11 kernel: (ll_pvfs.c, 459): ll_pvfs_getmeta
> failed on enqueue for 146.6.250.1:3000/pvfs-meta
> 
> compute-2-28
> 
> compute-1-0
> 
> compute-4-31
> 
> compute-2-4
> 
> compute-1-9
> 
> compute-1-12
> 
> Feb 13 00:18:56 compute-1-12 kernel: (pvfsdev.c, 1118): pvfsdev:
> setup_buffer() failure.
> 
> Feb 13 00:18:56 compute-1-12 kernel: (ll_pvfs.c, 659):
> ll_pvfs_file_write failed on 2600340
> 
> compute-2-31
> 
>  
> 
> Some results from two days earlier:
> 
> Feb 10 15:27:25 compute-1-7 kernel: pvfs: debug = 0x0, maxsz = 16777216
> bytes, buffer = dynamic, major = 0
> 
> Feb 11 16:04:14 compute-1-7 kernel: (ll_pvfs.c, 233): ll_pvfs_create
> failed on enqueue for 146.6.250.1:3000/pvfs-meta/test18
> 
> Feb 11 16:04:14 compute-1-7 kernel: (ll_pvfs.c, 87): ll_pvfs_lookup
> failed on enqueue for 146.6.250.1:3000/pvfs-meta/test18
> 
> Feb 12 17:56:32 compute-1-7 kernel: (ll_pvfs.c, 665): ll_pvfs_file_write
> got error in downcall
> 
>  
> 
>  
> 
>  
> 
> *********************************************************
> 
>  
> 
> [root at compute-1-30 root]# rpm -qa | grep pvfs
> 
> pvfs-1.6.2-1
> 
> contrib-pvfs-config-1.6.2-1
> 
> pvfs-kernel-1.6.2-1
> 
>  
> 
>  
> 
>  
> 
> Any idea of what might be happening?
> 
>  
> 
> Thanks,
> 
> Kent Milfeld
> 
> TACC, Texas Advanced Computing Center
> 
>  
> 
> Kent Milfeld  Ph.D.  Research Associate
> Texas Advanced Computing Center
> The University of Texas at Austin
> http://www.tacc.utexas.edu/  
> 
> (512) 475-9411 (main)
> (512) 475-9458 (direct)
> (512) 475-9445 (fax)
> milfeld at tacc.utexas.edu 
> 
>  
> 
>  
> 
>  
> 
> 



More information about the PVFS-users mailing list