[Pvfs2-users] pvfs2-client-core crashes with
unrecognized/unimplemented vfs operation of type ff000000
Rene Salmon
salmr0 at bp.com
Thu Mar 26 17:35:39 EST 2009
Hi Sam,
Thanks for the reply. I think we found a work around. Our code is
really hybrid meaning MPI/OpenMP. MPI across nodes and OpenMP inside
each node. Each node has several I/O threads writing to pvfs. We
usually send a kill signal to the MPI rank that started the openmp
threads on the node and that somehow sometimes does not kill all the I/O
threads and leaves some hanging and pvfs-core dies.
If we use pkill instead of just kill pkill seems to kill all the I/O
thread every time and things work as expected.
At the moment I can't recompile with debug on but will try that out
later.
Thanks
Rene
>
> > HI,
> >
> > We are getting some strange behavior out of pvfs-2.8.1 clients
> running
> > on some sles 10 sp 1 nodes.
> >
> > The pvfs2 clients can mount the pvfs2 file system with no problems
> we
> > then start an MPI job that runs on a small number of nodes. The
> > problem
> > happens when we try to kill the mpi job. As soon as we send the
> kill
> > signal to the mpi job several of our pvfs2 client nodes have their
> > pvfs2-client-core deamon die with this message:
> >
> > hpcp6671:~ # ps -ef |grep pvfs
> > root 25767 1 0 12:21 ?
> > 00:00:00 /bphpc5/vol0/salmr0/opt/pvfs-2.8.1/x86_64/sles10sp1/sbin/
> > pvfs2-client -p /bphpc5/vol0/salmr0/opt/pvfs-2.8.1/x86_64/sles10sp1/
> > sbin/pvfs2-client-core
> > root 16117 25767 0 15:02 ? 00:00:00 [pvfs2-client-co]
> >
> >
> >
> > hpcp6671:~ # cat /tmp/pvfs2-client.log
> > [E 12:21:35.567169] PVFS Client Daemon Started. Version 2.8.1
> > [D 12:21:35.567434] [INFO]: Mapping pointer 0x2acdf7aa3000 for I/O.
> > [D 12:21:35.579256] [INFO]: Mapping pointer 0x2acdf8ea5000 for I/O.
> > [E 15:02:54.988860] PVFS2 client: signal 11, faulty address is
> 0x41d5,
> > from 0x408d81
> > [E 15:02:54.989282] [bt] pvfs2-client-core [0x408d81]
> > [E 15:02:54.989294] [bt] pvfs2-client-core [0x408d81]
> > [E 15:02:54.989302] [bt] pvfs2-client-core(main+0xbc3) [0x40a173]
> > [E 15:02:54.989309] [bt] /lib64/libc.so.6(__libc_start_main+0xf4)
> > [0x2acdf788b154]
> > [E 15:02:54.989315] [bt] pvfs2-client-core [0x403519]
> > [E 15:02:54.991351] Child process with pid 25768 was killed by an
> > uncaught signal 6
>
> Hi Rene,
>
> This is a segfault in the client process. The daemon is restarting
> itself, which may be what the error below is from. I'll have to
> figure out what that 0x408d81 pointer maps to. Might not be all that
> useful though. Would you be willing to recompile with debugging
> enabled (rerun configure with CFLAGS=-g, and then rebuild)? That
> would at least give us line numbers to look at.
>
> >
> > [E 15:02:54.993980] PVFS Client Daemon Started. Version 2.8.1
> > [D 15:02:54.994242] [INFO]: Mapping pointer 0x2b94619a2000 for I/O.
> > [D 15:02:55.008318] [INFO]: Mapping pointer 0x2b9462da4000 for I/O.
> > [E 15:02:55.312456] Got an unrecognized/unimplemented vfs operation
> of
> > type ff000000.
> > [E 15:02:55.312497] Post of op: PVFS_VFS_OP_INVALID failed!
>
> I would try to fix the above before worrying about this one. It
> could
> be just fallout from the first failure.
>
> -sam
> >
> >
> >
> > Any ideas?
> >
> > thanks
> > Rene
> >
> > _______________________________________________
> > Pvfs2-users mailing list
> > Pvfs2-users at beowulf-underground.org
> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>
>
More information about the Pvfs2-users
mailing list