[Pvfs2-users] (no subject)

Mohamad Chaarawi mschaara at cs.uh.edu
Wed Oct 22 20:06:17 EDT 2008


Hey all,

I have successfully configured and installed PVFS2 on our cluster. I
managed to get the pvfs2 servers and clients running properly. The mount
point is set fine, and i can create/delete files properly.
Operating System: OpenSuSe 11.0

OpenMPI (trunk) used configured with:
    ./configure CFLAGS=-I/opt/pvfs2-2.7.1/include/
LDFLAGS=-L/opt/pvfs2-2.7.1/lib/ LIBS=-lpvfs2 -lpthread
--prefix=/home/mschaara/OMPI-PVFS2 --with-openib=/usr
--with-slurm=/opt/SLURM
--with-io-romio-flags=--with-file-system=pvfs2+ufs+nfs

pvfs-2.7.1:
    ./configure --with-kernel=/usr/src/linux-2.6.25.11/
--prefix=/opt/pvfs2-2.7.1 --enable-shared

However when i run an MPI program that open a PVFS2 file and Writes_all,
one of the PVFS2 servers crashes. I attached the test file that im running
(test_write_all.c). If i run the test file with 1,2,or 3 processes, it
gives the correct output. However with more than 3 processes it gives the
following error:
mpirun -np 5 ./test_write_all /pvfs2/test_5

    [E 18:48:03.117239] msgpair failed, will retry: Broken pipe
    [E 18:48:05.125048] msgpair failed, will retry: Connection refused
    [E 18:48:07.132856] msgpair failed, will retry: Connection refused
    [E 18:48:09.140665] msgpair failed, will retry: Connection refused
    [E 18:48:11.148474] msgpair failed, will retry: Connection refused
    [E 18:48:13.156282] msgpair failed, will retry: Connection refused
    [E 18:48:13.156282] *** msgpairarray_completion_fn: msgpair to server
tcp://shark07:3334 failed: Connection refused
    [E 18:48:13.156282] *** Out of retries.

When i Login in to the node (shark07) the server would not be running, If
is start the server again on that node, pvfs2 would be fine again (testing
by pvfs2-ping).
I saw this in the pvfs2-server.log:
    [E 10/22 18:55] src/common/misc/state-machine-fns.c line 289: Error:
state machine returned SM_ACTION_TERMINATE but didn't reach terminate
    [E 10/22 18:55]         [bt]
/opt/pvfs2-2.7.1/sbin/pvfs2-server(PINT_state_machine_next+0x1d5)
[0x41f1b5]
    [E 10/22 18:55]         [bt]
/opt/pvfs2-2.7.1/sbin/pvfs2-server(PINT_state_machine_continue+0x1e)
[0x41ec0e]
    [E 10/22 18:55]         [bt]
/opt/pvfs2-2.7.1/sbin/pvfs2-server(main+0xe3e) [0x4122be]
    [E 10/22 18:55]         [bt] /lib64/libc.so.6(__libc_start_main+0xe6)
[0x7f4640020436]
    [E 10/22 18:55]         [bt] /opt/pvfs2-2.7.1/sbin/pvfs2-server
[0x40f939]
    [D 10/22 18:55] server_state_machine_terminate 0x7881b0

and this in var/log/messages:
    shark07 kernel: pvfs2-server[14842]: segfault at 7f6ae09c7ec0 ip
7f6ae09c7ec0 sp 7fffea083628 error 15 in
libgcc_s.so.1[7f6ae09c7000+1000]

So any idea what might be wrong with my configuration on pvfs2, or OMPI?
Or might be a bug somewhere?

Thank you,


-- 
Mohamad Chaarawi
Research Assistant		  http://www.cs.uh.edu/~mschaara
Department of Computer Science	  University of Houston
4800 Calhoun, PGH Room 526        Houston, TX 77204, USA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_write_all.c
Type: text/x-csrc
Size: 1651 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-users/attachments/20081022/299b360d/test_write_all.bin


More information about the Pvfs2-users mailing list