[PVFS-users] random ll_pvfs_file_write ...downcall
Rob Ross
rross at mcs.anl.gov
Fri Feb 13 01:18:31 EST 2004
Hi,
What kind of processors do you have? Have you applied the recent patch
posted to pvfs-users?
Thanks,
Rob
On Fri, 13 Feb 2004, Kent F. Milfeld wrote:
> Hi,
>
>
>
> We just recently installed 1.6.2 (after successfully running 1.5.x).
>
> When I run 16-processor mpi-io jobs, the IO will sometimes fail
>
> with the following error information in the code (usually /mnt/pvfs
>
> will become unmounted):
>
>
>
> ...
>
> rank= 9 CLOSE IOERR= 0
>
> rank= 10 WRITE IOERR= 0 host=compute-9-23
>
>
>
> rank= 10 CLOSE IOERR= 0
>
> rank= 8 WRITE IOERR= 0 host=compute-9-30
>
>
>
> rank= 8 CLOSE IOERR= 0
>
> rank= 6 WRITE IOERR= 8288 host=compute-1-7
>
> ...
>
>
>
> In the /var/log/kern on compute-1-7 I found the following information:
>
>
>
>
>
>
>
> ************************************************************************
> ********
>
>
>
> Two runs, one about 00:09:30 and the other about ~00:17.
>
> compute-1-11
>
> Feb 13 00:09:30 compute-1-11 kernel: (ll_pvfs.c, 665):
> ll_pvfs_file_write got error in downcall
>
> Feb 13 00:14:42 compute-1-11 kernel: (ll_pvfs.c, 459): ll_pvfs_getmeta
> failed on enqueue for 146.6.250.1:3000/pvfs-meta
>
> compute-2-28
>
> compute-1-0
>
> compute-4-31
>
> compute-2-4
>
> compute-1-9
>
> compute-1-12
>
> Feb 13 00:18:56 compute-1-12 kernel: (pvfsdev.c, 1118): pvfsdev:
> setup_buffer() failure.
>
> Feb 13 00:18:56 compute-1-12 kernel: (ll_pvfs.c, 659):
> ll_pvfs_file_write failed on 2600340
>
> compute-2-31
>
>
>
> Some results from two days earlier:
>
> Feb 10 15:27:25 compute-1-7 kernel: pvfs: debug = 0x0, maxsz = 16777216
> bytes, buffer = dynamic, major = 0
>
> Feb 11 16:04:14 compute-1-7 kernel: (ll_pvfs.c, 233): ll_pvfs_create
> failed on enqueue for 146.6.250.1:3000/pvfs-meta/test18
>
> Feb 11 16:04:14 compute-1-7 kernel: (ll_pvfs.c, 87): ll_pvfs_lookup
> failed on enqueue for 146.6.250.1:3000/pvfs-meta/test18
>
> Feb 12 17:56:32 compute-1-7 kernel: (ll_pvfs.c, 665): ll_pvfs_file_write
> got error in downcall
>
>
>
>
>
>
>
> *********************************************************
>
>
>
> [root at compute-1-30 root]# rpm -qa | grep pvfs
>
> pvfs-1.6.2-1
>
> contrib-pvfs-config-1.6.2-1
>
> pvfs-kernel-1.6.2-1
>
>
>
>
>
>
>
> Any idea of what might be happening?
>
>
>
> Thanks,
>
> Kent Milfeld
>
> TACC, Texas Advanced Computing Center
>
>
>
> Kent Milfeld Ph.D. Research Associate
> Texas Advanced Computing Center
> The University of Texas at Austin
> http://www.tacc.utexas.edu/
>
> (512) 475-9411 (main)
> (512) 475-9458 (direct)
> (512) 475-9445 (fax)
> milfeld at tacc.utexas.edu
>
>
>
>
>
>
>
>
More information about the PVFS-users
mailing list