[Pvfs2-users] OpenIB/kernel interface: null pointer dereference in put_back_slot

Tad Kollar Thaddeus.J.Kollar at nasa.gov
Mon Mar 19 08:44:09 EST 2007


Hi,

We have PathScale Infinipath QLE7140 cards using the driver that's
included with the 2.6.20.x kernels. The machines in question have two
dual core Opteron 280 processors, SuperMicro H8DCE-HTe mainboards, and
4GB of ECC DDR400. I've tried pvfs-2.6.2 w/2.6.20.1 and the latest CVS
w/2.6.20.3.

When testing with bonnie (not the preferred benchmark, I realize), a
filesystem mounted over the TCP interface works fine, but when mounted
over the IB interface the kernel reports a null pointer dereference in
put_back_slot within one or two test attempts (complete reports below).
When the openib interface test is able to complete successfully it's up
to 2.5 times faster than gigabit/TCP, so we're very interested in making
use of it.

I've attached the pvfs2 fs config file.

The test is run with:
bonnie -s 8G:1024k -f -n 0

Please let me know if you need any more info...

Thanks!
Tad

Error report using 2.6.20.1 w/pvfs-2.6.2:

Mar 16 11:03:24 gx00 kernel: pvfs2: pvfs2_file_read -- wait timed out;
aborting attempt.
Mar 16 11:03:41 gx00 kernel: pvfs2: pvfs2_lookup -- wait timed out;
aborting attempt.
Mar 16 11:03:44 gx00 kernel: pvfs2: pvfs2_cancel -- wait timed out;
aborting attempt.
Mar 16 11:03:44 gx00 kernel: Unable to handle kernel NULL pointer
dereference at 0000000000000000 RIP:
Mar 16 11:03:44 gx00 kernel:  [<ffffffff881e061b>]
:pvfs2:put_back_slot+0x2b/0x70
Mar 16 11:03:44 gx00 kernel: PGD 192c8067 PUD 70ca0067 PMD 0
Mar 16 11:03:44 gx00 kernel: Oops: 0002 [1] SMP
Mar 16 11:03:44 gx00 kernel: CPU 3
Mar 16 11:03:44 gx00 kernel: Modules linked in: pvfs2 binfmt_misc ppdev
parport_pc lp parport thermal fan button process
or ac battery autofs4 ib_ipoib ib_umad ib_uverbs md_mod rdma_cm ib_cm
iw_cm ib_sa ib_mad ib_addr iscsi_tcp libiscsi scsi
_transport_iscsi ipv6 ext2 mbcache dm_snapshot dm_mirror dm_mod
w83627hf_wdt w83627hf eeprom adm1026 hwmon_vid i2c_isa t
sdev i2c_nforce2 k8temp psmouse serio_raw ib_ipath ib_core pcspkr
ehci_hcd ohci_hcd evdev fbcon tileblit font bitblit fb
con_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb
Mar 16 11:03:44 gx00 kernel: Pid: 22025, comm: bonnie Not tainted
2.6.20.1-opteron #1
Mar 16 11:03:44 gx00 kernel: RIP: 0010:[<ffffffff881e061b>] 
[<ffffffff881e061b>] :pvfs2:put_back_slot+0x2b/0x70
Mar 16 11:03:44 gx00 kernel: RSP: 0018:ffff810017189d08  EFLAGS: 00010247
Mar 16 11:03:44 gx00 kernel: RAX: 0000000000000000 RBX: 0000000000000000
RCX: 0000000002242660
Mar 16 11:03:44 gx00 kernel: RDX: 0000000000000000 RSI: 0000000000000000
RDI: ffffffff881f11c8
Mar 16 11:03:44 gx00 kernel: RBP: ffff810017189d28 R08: 0000000000000040
R09: 0000000000000003
Mar 16 11:03:44 gx00 kernel: R10: ffffffff80696740 R11: ffffffff80219210
R12: ffff810017189e68
Mar 16 11:03:44 gx00 kernel: R13: ffff810017189ed8 R14: 0000000000000001
R15: ffff810017189e18
Mar 16 11:03:44 gx00 kernel: FS:  00002aaaab28eb00(0000)
GS:ffff81011fd3a8c0(0000) knlGS:00000000f7e9e6b0
Mar 16 11:03:44 gx00 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
000000008005003b
Mar 16 11:03:44 gx00 kernel: CR2: 0000000000000000 CR3: 0000000026fcf000
CR4: 00000000000006e0
Mar 16 11:03:44 gx00 kernel: Process bonnie (pid: 22025, threadinfo
ffff810017188000, task ffff81007f85d800)
Mar 16 11:03:44 gx00 kernel: Stack:  ffff81008e94a338 ffff81009c9d42f8
0000000000100000 ffffffff881e06d7
Mar 16 11:03:44 gx00 kernel:  ffff810000000005 0000000000000000
ffffffff881f11c8 ffffffff881f11d0
Mar 16 11:03:44 gx00 kernel:  0000000000000001 ffffffff881dc18e
ffff81011afdd000 ffffffff00008003
Mar 16 11:03:44 gx00 kernel: Call Trace:
Mar 16 11:03:44 gx00 kernel:  [<ffffffff881e06d7>]
:pvfs2:pvfs_bufmap_put+0x37/0x40
Mar 16 11:03:44 gx00 kernel:  [<ffffffff881dc18e>]
:pvfs2:do_direct_readv_writev+0x9be/0xd70
Mar 16 11:03:44 gx00 kernel:  [<ffffffff8022e6a9>]
release_console_sem+0x1e9/0x240
Mar 16 11:03:44 gx00 kernel:  [<ffffffff802433d9>]
remove_wait_queue+0x19/0x60
Mar 16 11:03:44 gx00 kernel:  [<ffffffff881dd955>]
:pvfs2:pvfs2_file_read+0xe5/0x120
Mar 16 11:03:44 gx00 kernel:  [<ffffffff8028411b>] vfs_read+0xdb/0x1a0
Mar 16 11:03:44 gx00 kernel:  [<ffffffff80284633>] sys_read+0x53/0x90
Mar 16 11:03:44 gx00 kernel:  [<ffffffff80209bbe>] system_call+0x7e/0x83
Mar 16 11:03:44 gx00 kernel:
Mar 16 11:03:44 gx00 kernel:
Mar 16 11:03:44 gx00 kernel: Code: c7 04 90 00 00 00 00 48 8b 45 10 c7
00 01 00 00 00 48 8b 7d
Mar 16 11:03:44 gx00 kernel: RIP  [<ffffffff881e061b>]
:pvfs2:put_back_slot+0x2b/0x70
Mar 16 11:03:44 gx00 kernel:  RSP <ffff810017189d08>
Mar 16 11:03:44 gx00 kernel: CR2: 0000000000000000

Error report using 2.6.20.3 with CVS updated on 3/16:

Mar 16 14:44:03 gx00 kernel: pvfs2: module version
2.6.2pre1-2007-03-16-175232 loaded
Mar 16 14:45:40 gx00 kernel: pvfs2: pvfs2_file_read -- wait timed out;
aborting attempt.
Mar 16 14:46:00 gx00 kernel: pvfs2: pvfs2_cancel -- wait timed out;
aborting attempt.
Mar 16 14:46:00 gx00 kernel: Unable to handle kernel NULL pointer
dereference at 0000000000000000 RIP:
Mar 16 14:46:00 gx00 kernel:  [<ffffffff882555db>]
:pvfs2:put_back_slot+0x2b/0x70
Mar 16 14:46:00 gx00 kernel: PGD 6bdd2067 PUD 6bc8d067 PMD 0
Mar 16 14:46:00 gx00 kernel: Oops: 0002 [1] SMP
Mar 16 14:46:00 gx00 kernel: CPU 0
Mar 16 14:46:00 gx00 kernel: Modules linked in: pvfs2 binfmt_misc ppdev
parport_pc lp parport thermal fan button process
or ac battery autofs4 ib_ipoib ib_umad ib_uverbs md_mod ib_iser rdma_cm
ib_cm iw_cm ib_sa ib_mad ib_addr iscsi_tcp libis
csi scsi_transport_iscsi ipv6 ext2 mbcache dm_snapshot dm_mirror dm_mod
w83627hf_wdt w83627hf eeprom adm1026 hwmon_vid i
2c_isa tsdev ib_ipath ib_core e1000 ehci_hcd k8temp psmouse serio_raw
pcspkr ohci_hcd i2c_nforce2 evdev fbcon tileblit f
ont bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb
Mar 16 14:46:00 gx00 kernel: Pid: 13759, comm: bonnie Not tainted
2.6.20.3-opteron #1
Mar 16 14:46:00 gx00 kernel: RIP: 0010:[<ffffffff882555db>] 
[<ffffffff882555db>] :pvfs2:put_back_slot+0x2b/0x70
Mar 16 14:46:00 gx00 kernel: RSP: 0018:ffff81007487bd08  EFLAGS: 00010247
Mar 16 14:46:00 gx00 kernel: RAX: 0000000000000000 RBX: 0000000000000000
RCX: 0000000001264360
Mar 16 14:46:00 gx00 kernel: RDX: 0000000000000000 RSI: 0000000000000000
RDI: ffffffff88266188
Mar 16 14:46:00 gx00 kernel: RBP: ffff81007487bd28 R08: 0000000000000000
R09: 0000000000000000
Mar 16 14:46:00 gx00 kernel: R10: 0000000000000000 R11: 0000000000000000
R12: ffff81007487be68
Mar 16 14:46:00 gx00 kernel: R13: ffff81007487bed8 R14: 0000000000000001
R15: ffff81007487be18
Mar 16 14:46:00 gx00 kernel: FS:  00002aaaab28eb00(0000)
GS:ffffffff80627000(0000) knlGS:0000000000000000
Mar 16 14:46:00 gx00 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
000000008005003b
Mar 16 14:46:00 gx00 kernel: CR2: 0000000000000000 CR3: 000000007cb75000
CR4: 00000000000006e0
Mar 16 14:46:00 gx00 kernel: Process bonnie (pid: 13759, threadinfo
ffff81007487a000, task ffff81007fc9a1c0)
Mar 16 14:46:00 gx00 kernel: Stack:  ffff81007cac2538 ffff8100541344b8
0000000000100000 ffffffff88255697
Mar 16 14:46:00 gx00 kernel:  ffff810000000005 0000000000000000
ffffffff88266188 ffffffff88266190
Mar 16 14:46:00 gx00 kernel:  0000000000000001 ffffffff8825118e
00002aaaab290010 0000000000008003
Mar 16 14:46:00 gx00 kernel: Call Trace:
Mar 16 14:46:00 gx00 kernel:  [<ffffffff88255697>]
:pvfs2:pvfs_bufmap_put+0x37/0x40
Mar 16 14:46:00 gx00 kernel:  [<ffffffff8825118e>]
:pvfs2:do_direct_readv_writev+0x9be/0xd70
Mar 16 14:46:00 gx00 kernel:  [<ffffffff8028bfaa>] permission+0xca/0x140
Mar 16 14:46:00 gx00 kernel:  [<ffffffff802435b9>]
remove_wait_queue+0x19/0x60
Mar 16 14:46:00 gx00 kernel:  [<ffffffff88252955>]
:pvfs2:pvfs2_file_read+0xe5/0x120
Mar 16 14:46:00 gx00 kernel:  [<ffffffff8028432b>] vfs_read+0xdb/0x1a0
Mar 16 14:46:00 gx00 kernel:  [<ffffffff80284843>] sys_read+0x53/0x90
Mar 16 14:46:00 gx00 kernel:  [<ffffffff80209bbe>] system_call+0x7e/0x83
Mar 16 14:46:00 gx00 kernel:
Mar 16 14:46:00 gx00 kernel:
Mar 16 14:46:00 gx00 kernel: Code: c7 04 90 00 00 00 00 48 8b 45 10 c7
00 01 00 00 00 48 8b 7d
Mar 16 14:46:00 gx00 kernel: RIP  [<ffffffff882555db>]
:pvfs2:put_back_slot+0x2b/0x70
Mar 16 14:46:00 gx00 kernel:  RSP <ffff81007487bd08>
Mar 16 14:46:00 gx00 kernel: CR2: 0000000000000000
Mar 16 14:46:20 gx00 kernel:  pvfs2: pvfs2_inode_setattr -- wait timed
out; aborting attempt.
Mar 16 14:47:03 gx00 kernel: pvfs2: pvfs2_inode_getattr -- wait timed
out; aborting attempt.
Mar 16 14:47:43 gx00 last message repeated 2 times
Mar 16 14:48:03 gx00 kernel: pvfs2: pvfs2_inode_getattr -- wait timed
out; aborting attempt.

-------------- next part --------------
<Defaults>
	UnexpectedRequests 50
	LogFile /var/log/pvfs2-server.log
	EventLogging none
	LogStamp datetime
	BMIModules bmi_ib,bmi_tcp
	FlowModules flowproto_multiqueue
	PerfUpdateInterval 1000
	ServerJobBMITimeoutSecs 30
	ServerJobFlowTimeoutSecs 30
	ClientJobBMITimeoutSecs 300
	ClientJobFlowTimeoutSecs 300
	ClientRetryLimit 5
	ClientRetryDelayMilliSecs 2000
	TroveMethod alt-aio
</Defaults>

<Aliases>
	Alias gx47 ib://gx47:3335,tcp://gx47:3334
	Alias gx48 ib://gx48:3335,tcp://gx48:3334
	Alias gx49 ib://gx49:3335,tcp://gx49:3334
	Alias gx50 ib://gx50:3335,tcp://gx50:3334
	Alias gx51 ib://gx51:3335,tcp://gx51:3334
	Alias gx52 ib://gx52:3335,tcp://gx52:3334
	Alias gx53 ib://gx53:3335,tcp://gx53:3334
	Alias gx54 ib://gx54:3335,tcp://gx54:3334
</Aliases>

<Filesystem>
	Name pvfs2-fs
	ID 1221584540
	RootHandle 1048576
	<MetaHandleRanges>
		Range gx47 4-536870914
		Range gx48 536870915-1073741825
	</MetaHandleRanges>
	<DataHandleRanges>
		Range gx49 1073741826-1610612736
		Range gx50 1610612737-2147483647
		Range gx51 2147483648-2684354558
		Range gx52 2684354559-3221225469
		Range gx53 3221225470-3758096380
		Range gx54 3758096381-4294967291
	</DataHandleRanges>
	<StorageHints>
		TroveSyncMeta yes
		TroveSyncData no
		TroveMethod alt-aio
	</StorageHints>
</Filesystem>


More information about the Pvfs2-users mailing list