[Pvfs2-users] PVFS2 installation problem

Murali Vilayannur murali.vilayannur at gmail.com
Mon Jul 2 19:19:06 EDT 2007


Hi Florin,
Thanks for getting back on that!
This is quite weird. it probably points to some platform-specific library issue.
Since we do use threads, perhaps it is time to retry running configure
by disabling usage of threads and see if that helps?

./configure --disable-thread-safety is something you can try
perhaps ./configure --enable-nptl-workaround is also something you can
try (not together with the previous one though) to workaround glibc
oddities.
Sam, RobL, Pete any ideas? I am lost..:(
Final alternative is to perhaps do a live debug on your machine if possible..
thanks,
Murali

On 7/2/07, Florin Isaila <florin.isaila at gmail.com> wrote:
> Hi,
>
> many thanks Murali. I have just tried that, but it keeps getting stuck
> with an even stranger stack trace:
>
> (gdb) bt
> #0  0x0ff4b2d0 in poll () from /lib/tls/libc.so.6
> #1  0x0ffc871c in ?? () from /lib/tls/libc.so.6
> #2  0x0ffc871c in ?? () from /lib/tls/libc.so.6
> Previous frame identical to this frame (corrupt stack?)
>
> Any other suggestions?
>
> Best regards
> Florin
>
> On 7/2/07, Murali Vilayannur <murali.vilayannur at gmail.com> wrote:
> > Hi Florin,
> > Given that both your backtraces point to epoll(), can you run make
> > clean followed by configure with --disable-epoll, rebuild everything
> > and see if that works?
> > If it does work, it probably points to some epoll specific bug on ppc
> > either in pvfs2 or the libepoll code..
> > thanks,
> > Murali
> >
> > On 7/2/07, Florin Isaila <florin.isaila at gmail.com> wrote:
> > > Hi,
> > >
> > > We have installed PVFS2 2.6.3 over Ethernet on a SUSE distribution,
> > > locally on a biprocessor (PowerPC 970FX) machine.
> > >
> > > Some commands like pvfs2-ping, pvfs2-mkdir, pvfs2-ls (w/o parameters)
> > > work fine.
> > >
> > > But we can not get it run for some pvfs2-* commands. For instance
> > > pvfs2-cp gets stuck. Here the trace of gdb:
> > >
> > > (gdb) bt
> > > #0  0x0ff5596c in epoll_wait () from /lib/tls/libc.so.6
> > > #1  0x100a062c in BMI_socket_collection_testglobal (scp=0x100e48b0,
> > >     incount=128, outcount=0xffff97b0, maps=0xffff93b0, status=0xffff95b0,
> > >     poll_timeout=10, external_mutex=0x100d2ce0)
> > >     at socket-collection-epoll.c:281
> > > #2  0x1009bf24 in tcp_do_work (max_idle_time=10) at bmi-tcp.c:2681
> > > #3  0x10098d10 in BMI_tcp_testcontext (incount=5, out_id_array=0x100d2b58,
> > >     outcount=0xffff9864, error_code_array=0x100d2b80,
> > >     actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0, max_idle_time=10,
> > >     context_id=0) at bmi-tcp.c:1303
> > > #4  0x1005aa18 in BMI_testcontext (incount=5, out_id_array=0x100d2b58,
> > >     outcount=0x100d14cc, error_code_array=0x100d2b80,
> > >     actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0,
> > >     max_idle_time_ms=10, context_id=0) at bmi.c:944
> > > #5  0x10071fc8 in bmi_thread_function (ptr=0x0) at thread-mgr.c:239
> > > #6  0x10072e24 in PINT_thread_mgr_bmi_push (max_idle_time=10)
> > >     at thread-mgr.c:815
> > > #7  0x10071460 in do_one_work_cycle_all (idle_time_ms=10) at job.c:4661
> > > #8  0x1007025c in job_testcontext (out_id_array_p=0xffff99d0,
> > >     inout_count_p=0xffff99b8, returned_user_ptr_array=0xffffd1d0,
> > >     out_status_array_p=0xffffa1d0, timeout_ms=10, context_id=1) at job.c:4068
> > > #9  0x1000fdb0 in PINT_client_state_machine_test (op_id=3,
> > >     error_code=0xffffd670) at client-state-machine.c:536
> > > ---Type <return> to continue, or q <return> to quit---
> > > #10 0x1001041c in PINT_client_wait_internal (op_id=3,
> > >     in_op_str=0x100b209c "fs_add", out_error=0xffffd670,
> > >     in_class_str=0x100a97d4 "sys") at client-state-machine.c:733
> > > #11 0x10010734 in PVFS_sys_wait (op_id=3, in_op_str=0x100b209c "fs_add",
> > >     out_error=0xffffd670) at client-state-machine.c:861
> > > #12 0x10035c4c in PVFS_sys_fs_add (mntent=0x100d3030) at fs-add.sm:205
> > > #13 0x1004c220 in PVFS_util_init_defaults () at pvfs2-util.c:1040
> > > #14 0x1000a5c8 in main (argc=3, argv=0xffffe3b4) at pvfs2-cp.c:135
> > >
> > > Some other times (but rarely) is getting stuck at a different place:
> > >
> > > (gdb) bt
> > > #0  0x0ff5596c in epoll_wait () from /lib/tls/libc.so.6
> > > #1  0x100a062c in BMI_socket_collection_testglobal (scp=0x100e48b0,
> > >     incount=128, outcount=0xffff9b30, maps=0xffff9730, status=0xffff9930,
> > >     poll_timeout=10, external_mutex=0x100d2ce0)
> > >     at socket-collection-epoll.c:281
> > > #2  0x1009bf24 in tcp_do_work (max_idle_time=10) at bmi-tcp.c:2681
> > > #3  0x10098d10 in BMI_tcp_testcontext (incount=5, out_id_array=0x100d2b58,
> > >     outcount=0xffff9be4, error_code_array=0x100d2b80,
> > >     actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0, max_idle_time=10,
> > >     context_id=0) at bmi-tcp.c:1303
> > > #4  0x1005aa18 in BMI_testcontext (incount=5, out_id_array=0x100d2b58,
> > >     outcount=0x100d14cc, error_code_array=0x100d2b80,
> > >     actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0,
> > >     max_idle_time_ms=10, context_id=0) at bmi.c:944
> > > #5  0x10071fc8 in bmi_thread_function (ptr=0x0) at thread-mgr.c:239
> > > #6  0x10072e24 in PINT_thread_mgr_bmi_push (max_idle_time=10)
> > >     at thread-mgr.c:815
> > > #7  0x10071460 in do_one_work_cycle_all (idle_time_ms=10) at job.c:4661
> > > #8  0x1007025c in job_testcontext (out_id_array_p=0xffff9d50,
> > >     inout_count_p=0xffff9d38, returned_user_ptr_array=0xffffd550,
> > >     out_status_array_p=0xffffa550, timeout_ms=10, context_id=1) at job.c:4068
> > > #9  0x1000fdb0 in PINT_client_state_machine_test (op_id=28,
> > >     error_code=0xffffda1c) at client-state-machine.c:536
> > > ---Type <return> to continue, or q <return> to quit---
> > > #10 0x1001041c in PINT_client_wait_internal (op_id=28,
> > >     in_op_str=0x100ac1b8 "io", out_error=0xffffda1c,
> > >     in_class_str=0x100a97d4 "sys") at client-state-machine.c:733
> > > #11 0x10010734 in PVFS_sys_wait (op_id=28, in_op_str=0x100ac1b8 "io",
> > >     out_error=0xffffda1c) at client-state-machine.c:861
> > > #12 0x1001b78c in PVFS_sys_io (ref=
> > >       {handle = 1048570, fs_id = 1957135728, __pad1 = -26176},
> > >     file_req=0x100d07d8, file_req_offset=0, buffer=0x40068008,
> > >     mem_req=0x100efbd0, credentials=0xffffe060, resp_p=0xffffda90,
> > >     io_type=PVFS_IO_WRITE) at sys-io.sm:363
> > > #13 0x1000b078 in generic_write (dest=0xffffddb0,
> > >     buffer=0x40068008 "\177ELF\001\002\001", offset=0, count=2469777,
> > >     credentials=0xffffe060) at pvfs2-cp.c:365
> > > #14 0x1000a824 in main (argc=3, argv=0xffffe3b4) at pvfs2-cp.c:180
> > >
> > >
> > > After breaking the program with Ctrl-C, the files appear created. Any
> > > clue where this can come from? It appears like the metadata
> > > communication works but the data not.
> > >
> > > Bellow the result of the ping command.
> > >
> > > Many thanks
> > > Florin
> > >
> > > pvfs2-ping -m ~/florin/mnt/pvfs2/
> > >
> > > (1) Parsing tab file...
> > >
> > > (2) Initializing system interface...
> > >
> > > (3) Initializing each file system found in tab file:
> > > /home/A40001/u72877927/florin/app
> > >                                s/etc/pvfs2tab...
> > >
> > >    PVFS2 servers: tcp://localhost:55555
> > >    Storage name: pvfs2-fs
> > >    Local mount point: /home/A40001/u72877927/florin/mnt/pvfs2
> > >    /home/A40001/u72877927/florin/mnt/pvfs2: Ok
> > >
> > > (4) Searching for /home/A40001/u72877927/florin/mnt/pvfs2/ in pvfstab...
> > >
> > >    PVFS2 servers: tcp://localhost:55555
> > >    Storage name: pvfs2-fs
> > >    Local mount point: /home/A40001/u72877927/florin/mnt/pvfs2
> > >
> > >    meta servers:
> > >    tcp://localhost:55555
> > >
> > >    data servers:
> > >    tcp://localhost:55555
> > >
> > > (5) Verifying that all servers are responding...
> > >
> > >    meta servers:
> > >    tcp://localhost:55555 Ok
> > >
> > >    data servers:
> > >    tcp://localhost:55555 Ok
> > >
> > > (6) Verifying that fsid 1957135728 is acceptable to all servers...
> > >
> > >    Ok; all servers understand fs_id 1957135728
> > >
> > > (7) Verifying that root handle is owned by one server...
> > >
> > >    Root handle: 1048576
> > >      Ok; root handle is owned by exactly one server.
> > >
> > > =============================================================
> > >
> > > The PVFS2 filesystem at /home/A40001/u72877927/florin/mnt/pvfs2/
> > > appears to be correctly configured.
> > > _______________________________________________
> > > Pvfs2-users mailing list
> > > Pvfs2-users at beowulf-underground.org
> > > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
> > >
> >
>


More information about the Pvfs2-users mailing list