[Pvfs2-users] PVFS2 over Infiniband

Dardo D Kleiner - CONTRACTOR dkleiner at cmf.nrl.navy.mil
Mon Mar 17 10:49:09 EST 2008


Thanks for the reply, comments inline...

Pete Wyckoff wrote:
> dkleiner at cmf.nrl.navy.mil wrote on Thu, 13 Mar 2008 13:14 -0400:
>> By tracing the call down through libibverbs, I discovered that
>> the call path went into some sort of compatibility layer in
>> OFED and ended up hitting __ibv_create_cq_1_0 (in
>> src/userspace/libibverbs/compat-1_0.c) instead of the
>> __ibv_create_cq function in src/userspace/libibverbs/src/verbs.c.
>> I was able to "correct" this by forcibly linking libpvfs2.so
>> with libibverbs, so there's likely some symbol versioning black
>> magic going on here.  Perhaps Florin could confirm this to be
>> his problem as well.
> 
> I was just debugging this exact issue in the context of another
> project and app and discovered that it is necessary to add -libverbs
> to the "gcc -shared" command when building libpvfs2.so.  I'm still a
> but fuzzy on exactly why.  Checked in a patch to mainline just now,
> with this message:
> 
>     shared lib deps
>     
>     For proper symbol versioning, at shared library build time, it is necessary
>     to specify all the other shared libraries that will be required to run
>     the one we are about to create.  The particular place this shows up as a
>     problem is for libibverbs.so when building bmi ib.  If you don't tell the
>     linker that you'll need libibverbs.so later, at runtime, it will pickup
>     the 1.0 rather than the 1.1 symbols.
> 

Good to know I wasn't just flailing a hack in there, that took
me some frustrating hours to pinpoint...

>> So, I get it up and running (VFS interface and all), and
>> quickly hit a failure trying to do a basic
>> "dd if=/dev/zero of=testfile bs=4M count=10000" on my pvfs2
>> mount.  Smaller tests seem to work ok (e.g. count=100).
>> Attached is the pvfs-client.log output - and before I try to get
>> our IB guys involved I wanted to see if anything jumped out to
>> the PVFS2 developer community (perhaps BMI related?) or if I
>> could get some help debugging it further.
> 
>> [D 15:46:56.524470] [INFO]: Mapping pointer 0x2b09efdf4000 for I/O.
>> [D 15:46:56.550917] [INFO]: Mapping pointer 0x2b09f11f6000 for I/O.
>> [E 15:49:41.383093] fp_multiqueue_cancel: flow proto cancel called on 0x5acfe8
>> [E 15:49:41.383136] handle_io_error: flow proto error cleanup started on 0x5acfe8, error_code: -1610613121
> 
> Something timed out.  Check the server logs and see if they have anything
> to say.  Once an operation is cancelled, recovery is messy.  Would be good
> to understand why this cancellation happened.

Nothing at all in the server logs with no debugs turned on.  I put
"flow,flowproto,network,cancel" as my gossip mask and obviously got
tons of output but a cursory scan didn't show much of interest except:

[E 03/14 09:31] job_time_mgr_expire: job time out: cancelling flow operation, job_id: 4322.
[E 03/14 09:31] fp_multiqueue_cancel: flow proto cancel called on 0x2aaaaad3c060
[D 03/14 09:31] fp_multiqueue_cancel: called on active flow, 524288 bytes transferred.
[D 03/14 09:31] flowproto-multiqueue handle_io_error() called for flow 0x2aaaaad3c060.
[E 03/14 09:31] handle_io_error: flow proto error cleanup started on 0x2aaaaad3c060, error_code: -1610613121
[D 03/14 09:31] flowprotocol cleanup: unposting BMI operation.
[D 03/14 09:31] BMI_cancel: cancel id 46912498833328
[D 03/14 09:31] memcache_deregister: dec refcount [0] 0x2aaaaac15010 len 262144 (via 0x2aaaaac15010 len 262144) refcnt now 1.
[E 03/14 09:31] PVFS2 server: signal 11, faulty address is 0x20, from 0x446e9a
[E 03/14 09:31] [bt] /afs/ld/software/sys/sbin/pvfs2-server(memcache_deregister+0x3a) [0x446e9a]
[E 03/14 09:31] [bt] /afs/ld/software/sys/sbin/pvfs2-server(memcache_deregister+0x3a) [0x446e9a]
[E 03/14 09:31] [bt] /afs/ld/software/sys/sbin/pvfs2-server [0x446506]
[E 03/14 09:31] [bt] /afs/ld/software/sys/sbin/pvfs2-server(PINT_thread_mgr_bmi_cancel+0xaa) [0x42999a]
[E 03/14 09:31] [bt] /afs/ld/software/sys/sbin/pvfs2-server [0x44d920]
[E 03/14 09:31] [bt] /afs/ld/software/sys/sbin/pvfs2-server [0x44dcf4]
[E 03/14 09:31] [bt] /afs/ld/software/sys/sbin/pvfs2-server [0x44e067]
[E 03/14 09:31] [bt] /afs/ld/software/sys/sbin/pvfs2-server(PINT_flow_cancel+0x42) [0x422c72]
[E 03/14 09:31] [bt] /afs/ld/software/sys/sbin/pvfs2-server(job_flow_cancel+0x3d) [0x42764d]
[E 03/14 09:31] [bt] /afs/ld/software/sys/sbin/pvfs2-server(job_time_mgr_expire+0x1ca) [0x42a05a]
[E 03/14 09:31] [bt] /afs/ld/software/sys/sbin/pvfs2-server [0x43d8a9]
[E 03/14 09:31] [bt] /afs/ld/software/sys/sbin/pvfs2-server(PINT_state_machine_invoke+0xc5) [0x41d515]
[E 03/14 09:31] [bt] /afs/ld/software/sys/sbin/pvfs2-server(PINT_state_machine_next+0xab) [0x41d7db]
[E 03/14 09:31] [bt] /afs/ld/software/sys/sbin/pvfs2-server(PINT_state_machine_continue+0x1e) [0x41d3be]
[E 03/14 09:31] [bt] /afs/ld/software/sys/sbin/pvfs2-server(main+0xde8) [0x410dc8]

On the client side, best I could deduce is that there's a cyclic pattern of "flowproto
completing" and "Job flows in progress (callback time): " where the callback time goes
from 6 to 0 while running "normally" and then in the middle of one cycle of that we
get the "fp_multiqueue_cancel" and everything goes boom.  I can make these logs
available if it would help (perhaps with some other more useful debug mask?)

>> [E 15:49:42.697884] Error: ib_check_cq: unknown send state SQ_CANCELLED (10) of sq 0x552050.
> 
> Fixed this in mainline.  Won't help to understand why your operation was
> cancelled, though.

Will update and try again asap, probably later this week...

>> I've got 6 I/O servers and 1 metadata server, pvfs2 storage
>> is on SRP-based LUNs on a DDN array, fs.conf is also attached.
>>
>> Oh, and my pvfs2 configure options:
>>
>> ./configure --prefix=/afs/ld/software/sys \
>> 	--with-openib=/usr \
>> 	--with-openib-libs=/usr/lib64 \
>> 	--with-kernel=/usr/src/linux-2.6.16.54-0.2.3 \
>> 	--enable-shared --enable-trusted-connections \
>> 	--enable-mmap-racache --without-bmi-tcp
> 
> I think you don't need "--with-openib-libs".  And trusted
> connections only apply to BMI-TCP currently.  An interested party
> could add this pretty quickly though.

Noted, thanks.

- Dardo


More information about the Pvfs2-users mailing list