[Pvfs2-users] PVFS2 over SSD page allocation error
Phil Carns
carns at mcs.anl.gov
Wed Feb 15 10:48:29 EST 2012
Hi Vish and Becky,
This particular error message doesn't really look like it has much to do
with the SSD. It looks more like a network/memory problem that PVFS
just happens to trigger when running your workload. I think the kernel
is having a hard time allocating the right kind of memory for a recv
buffer on the network using IPOIB. Here is a similar report from
another project:
http://old.nabble.com/Page-allocation-failure-td32319877.html
I don't know exactly how to fix the problem, but you might want to look
for ways to keep the memory usage down. One possibility might be to
stop the servers, change the "TroveMethod" configuration parameter in
the config file to "TroveMethod directio" and start the servers back
up. This will cause PVFS to use O_DIRECT for all of its I/O, which will
at least prevent memory from being taken up by the Linux buffer cache.
That is also likely to be a faster method of performing I/O for a high
end SSD anyway, regardless of the memory impact.
thanks,
-Phil
On 02/15/2012 10:27 AM, Becky Ligon wrote:
> Also, which kernel version are you using and which SSD card are you
> using.
>
> Becky
>
> On Wed, Feb 15, 2012 at 10:18 AM, Becky Ligon <ligon at omnibond.com
> <mailto:ligon at omnibond.com>> wrote:
>
>
>
> On Wed, Feb 15, 2012 at 10:18 AM, Becky Ligon <ligon at omnibond.com
> <mailto:ligon at omnibond.com>> wrote:
>
> Vish:
>
> I have not figured out why you are getting this error. My
> co-worker, who has installed the server on a SSD, never saw
> this problem. I will give it a try on our machines and see
> what happens. Can you send me your OrangeFS configuration
> file and the version of OrangeFS that you are using?
>
> Thanks,
> Becky
>
>
> On Tue, Feb 14, 2012 at 4:45 PM, Vishwanath Venkatesan
> <vvenkates at gmail.com <mailto:vvenkates at gmail.com>> wrote:
>
> Hi Becky,
> /
> /
> I had sent this email a long time ago. I had a question in
> this. Can you tell me what this error means. It looks
> like an overflow to me. I mean any insight on why the
> error could occur.
> Please let me know.
>
>
>
> Thanks
> Vish
> On Tue, Jan 3, 2012 at 1:02 PM, Becky Ligon
> <ligon at omnibond.com <mailto:ligon at omnibond.com>> wrote:
>
> That is interesting. We have not tried to run the
> server on SSD, so there may be differences in
> allocation between ssd and hard drives. We will have
> to investigate. Can you tell me which version of the
> code you are using? If you issue pvfs2-server
> --version, the version will be displayed.
>
> Thanks,
> Becky
>
> On Tue, Jan 3, 2012 at 12:49 PM, Vishwanath Venkatesan
> <vvenkates at gmail.com <mailto:vvenkates at gmail.com>> wrote:
>
> Hi,
>
> We have a pvfs2 filesystem over an SSD storage of
> 2TB. There are 2 pvfs2 servers mounted over two
> sections of the storage each viewing 1TB. There
> are 16 compute nodes which are pvfs2 clients. When
> I did a write of 65G from one compute node to the
> file system and watched the log there were some
> page allocation errors. Although the write did
> complete successfully I am suspecting whether this
> might pull down the performance of the PVFS2
> filesystem. I have provided the trace, any insight
> from pvfs2 experts will be really helpful.
>
> The error trace looked like
> ########################################
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355862] pvfs2-server: page allocation
> failure. order:0, mode:0x20
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355868] Pid: 24210, comm: pvfs2-server
> Not tainted 2.6.30-perfctr #8
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355871] Call Trace:
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355873] <IRQ> [<ffffffff802b384d>]
> __alloc_pages_internal+0x39d/0x490
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355887] [<ffffffff802dc332>]
> alloc_pages_current+0x82/0xd0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355904] [<ffffffffa03c7a2f>]
> ipoib_cm_alloc_rx_skb+0xdf/0x460 [ib_ipoib]
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355908] [<ffffffff802126e0>] ?
> nommu_map_page+0x0/0xd0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355916] [<ffffffffa03c95d7>]
> ipoib_cm_handle_rx_wc+0x287/0x730 [ib_ipoib]
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355922] [<ffffffffa03c2144>]
> ipoib_poll+0xe4/0x1c0 [ib_ipoib]
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355927] [<ffffffff804dc4a7>]
> net_rx_action+0x117/0x1d0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355932] [<ffffffff8024fff4>]
> __do_softirq+0x84/0x210
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355935] [<ffffffff8020d0ac>]
> call_softirq+0x1c/0x30
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355937] [<ffffffff8020e84d>]
> do_softirq+0x3d/0x80
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355940] [<ffffffff8025027d>]
> irq_exit+0x8d/0x90
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355942] [<ffffffff8020e565>]
> do_IRQ+0x85/0xf0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355947] [<ffffffff8020c913>]
> ret_from_intr+0x0/0xa
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355948] <EOI> [<ffffffff802b9146>] ?
> shrink_page_list+0x686/0x820
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355955] [<ffffffff8020c90e>] ?
> common_interrupt+0xe/0x13
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355958] [<ffffffff802b98e8>] ?
> shrink_list+0x1f8/0x5d0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355961] [<ffffffff802ba240>] ?
> shrink_zone+0x240/0x360
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355966] [<ffffffff8026a477>] ?
> getnstimeofday+0x57/0xe0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355968] [<ffffffff802ba88e>] ?
> try_to_free_pages+0x27e/0x430
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355971] [<ffffffff802b8080>] ?
> isolate_pages_global+0x0/0x2a0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355975] [<ffffffff802b36ac>] ?
> __alloc_pages_internal+0x1fc/0x490
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355979] [<ffffffff802dc332>] ?
> alloc_pages_current+0x82/0xd0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355982] [<ffffffff802b01be>] ?
> __get_free_pages+0xe/0x80
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355985] [<ffffffff8024820d>] ?
> copy_process+0xbd/0x13d0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355988] [<ffffffff802495d0>] ?
> do_fork+0x80/0x400
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355991] [<ffffffff8020a9d3>] ?
> sys_clone+0x23/0x30
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355994] [<ffffffff8020c2d3>] ?
> stub_clone+0x13/0x20
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.355998] [<ffffffff8020bf6b>] ?
> system_call_fastpath+0x16/0x1b
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356000] Mem-Info:
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356002] Node 0 DMA per-cpu:
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356005] CPU 0: hi: 0, btch: 1
> usd: 0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356007] CPU 1: hi: 0, btch: 1
> usd: 0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356008] Node 0 DMA32 per-cpu:
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356011] CPU 0: hi: 186, btch: 31
> usd: 75
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356013] CPU 1: hi: 186, btch: 31
> usd: 54
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356017] Active_anon:2303 active_file:4218
> inactive_anon:4108
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356018] inactive_file:464726
> unevictable:0 dirty:48321 writeback:0 unstable:0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356019] free:2531 slab:21979 mapped:1458
> pagetables:538 bounce:0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356021] Node 0 DMA free:8008kB min:16kB
> low:20kB high:24kB active_anon:0kB
> inactive_anon:0kB active_file:536kB
> inactive_file:152kB unevictable:0kB present:6744kB
> pages_scanned:0 all_unreclaimable? no
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356027] lowmem_reserve[]: 0 2003 2003 2003
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356030] Node 0 DMA32 free:2116kB
> min:5716kB low:7144kB high:8572kB
> active_anon:9212kB inactive_anon:16432kB
> active_file:16336kB inactive_file:1858752kB
> unevictable:0kB present:2051244kB
> pages_scanned:129 all_unreclaimable? no
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356035] lowmem_reserve[]: 0 0 0 0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356038] Node 0 DMA: 2*4kB 14*8kB 7*16kB
> 3*32kB 2*64kB 3*128kB 2*256kB 1*512kB 2*1024kB
> 0*2048kB 1*4096kB = 8008kB
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356046] Node 0 DMA32: 0*4kB 1*8kB 0*16kB
> 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 2*1024kB
> 0*2048kB 0*4096kB = 2056kB
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356054] 472673 total pagecache pages
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356055] 3606 pages in swap cache
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356057] Swap cache stats: add 1626590,
> delete 1622984, find 11678569/11742820
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356059] Free swap = 2073400kB
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.356060] Total swap = 2095096kB
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368134] 524016 pages RAM
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368136] 9162 pages reserved
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368137] 469049 pages shared
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368139] 42601 pages non-shared
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368170] kswapd0: page allocation failure.
> order:0, mode:0x20
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368174] Pid: 24, comm: kswapd0 Not
> tainted 2.6.30-perfctr #8
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368175] Call Trace:
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368177] <IRQ> [<ffffffff802b384d>]
> __alloc_pages_internal+0x39d/0x490
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368187] [<ffffffff802dc332>]
> alloc_pages_current+0x82/0xd0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368197] [<ffffffffa03c7a2f>]
> ipoib_cm_alloc_rx_skb+0xdf/0x460 [ib_ipoib]
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368202] [<ffffffff802126e0>] ?
> nommu_map_page+0x0/0xd0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368208] [<ffffffffa03c95d7>]
> ipoib_cm_handle_rx_wc+0x287/0x730 [ib_ipoib]
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368215] [<ffffffffa03c2144>]
> ipoib_poll+0xe4/0x1c0 [ib_ipoib]
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368219] [<ffffffff804dc4a7>]
> net_rx_action+0x117/0x1d0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368224] [<ffffffff8024fff4>]
> __do_softirq+0x84/0x210
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368227] [<ffffffff8020d0ac>]
> call_softirq+0x1c/0x30
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368229] [<ffffffff8020e84d>]
> do_softirq+0x3d/0x80
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368232] [<ffffffff8025027d>]
> irq_exit+0x8d/0x90
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368234] [<ffffffff8020e565>]
> do_IRQ+0x85/0xf0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368239] [<ffffffff8020c913>]
> ret_from_intr+0x0/0xa
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368240] <EOI> [<ffffffff805ab462>] ?
> thread_return+0x74/0x6e2
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368248] [<ffffffff805abae8>] ?
> schedule+0x18/0x40
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368251] [<ffffffff802bb1a9>] ?
> kswapd+0x769/0x780
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368254] [<ffffffff802b8080>] ?
> isolate_pages_global+0x0/0x2a0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368258] [<ffffffff80261b50>] ?
> autoremove_wake_function+0x0/0x40
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368261] [<ffffffff802baa40>] ?
> kswapd+0x0/0x780
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368263] [<ffffffff802baa40>] ?
> kswapd+0x0/0x780
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368266] [<ffffffff80261498>] ?
> kthread+0x58/0xa0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368269] [<ffffffff8020cfaa>] ?
> child_rip+0xa/0x20
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368272] [<ffffffff80261440>] ?
> kthread+0x0/0xa0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368275] [<ffffffff8020cfa0>] ?
> child_rip+0x0/0x20
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368276] Mem-Info:
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368277] Node 0 DMA per-cpu:
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368280] CPU 0: hi: 0, btch: 1
> usd: 0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368282] CPU 1: hi: 0, btch: 1
> usd: 0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368283] Node 0 DMA32 per-cpu:
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368286] CPU 0: hi: 186, btch: 31
> usd: 75
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368288] CPU 1: hi: 186, btch: 31
> usd: 182
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368292] Active_anon:2303 active_file:4218
> inactive_anon:4108
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368293] inactive_file:464595
> unevictable:0 dirty:48321 writeback:0 unstable:0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368294] free:2531 slab:21979 mapped:1458
> pagetables:538 bounce:0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368296] Node 0 DMA free:8008kB min:16kB
> low:20kB high:24kB active_anon:0kB
> inactive_anon:0kB active_file:536kB
> inactive_file:152kB unevictable:0kB present:6744kB
> pages_scanned:0 all_unreclaimable? no
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368301] lowmem_reserve[]: 0 2003 2003 2003
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368304] Node 0 DMA32 free:2116kB
> min:5716kB low:7144kB high:8572kB
> active_anon:9212kB inactive_anon:16432kB
> active_file:16336kB inactive_file:1858228kB
> unevictable:0kB present:2051244kB
> pages_scanned:257 all_unreclaimable? no
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368310] lowmem_reserve[]: 0 0 0 0
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368312] Node 0 DMA: 2*4kB 14*8kB 7*16kB
> 3*32kB 2*64kB 3*128kB 2*256kB 1*512kB 2*1024kB
> 0*2048kB 1*4096kB = 8008kB
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368321] Node 0 DMA32: 0*4kB 1*8kB 0*16kB
> 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 2*1024kB
> 0*2048kB 0*4096kB = 2056kB
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368329] 472549 total pagecache pages
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368330] 3606 pages in swap cache
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368332] Swap cache stats: add 1626590,
> delete 1622984, find 11678569/11742820
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368334] Free swap = 2073400kB
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.368335] Total swap = 2095096kB
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.380288] 524016 pages RAM
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.380290] 9162 pages reserved
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.380291] 469017 pages shared
> Dec 27 15:53:46 ioserver-02 kernel:
> [3026630.380292] 42601 pages non-shared
> ###############################################################################
>
>
> Thanks
> Vish
>
>
>
>
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users at beowulf-underground.org
> <mailto:Pvfs2-users at beowulf-underground.org>
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>
>
>
> --
> Becky Ligon
> OrangeFS Support and Development
> Omnibond Systems
> Anderson, South Carolina
>
>
>
>
>
>
> --
> Becky Ligon
> OrangeFS Support and Development
> Omnibond Systems
> Anderson, South Carolina
>
>
>
>
>
> --
> Becky Ligon
> OrangeFS Support and Development
> Omnibond Systems
> Anderson, South Carolina
>
>
>
>
>
> --
> Becky Ligon
> OrangeFS Support and Development
> Omnibond Systems
> Anderson, South Carolina
>
>
>
>
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.beowulf-underground.org/pipermail/pvfs2-users/attachments/20120215/642f6905/attachment-0001.htm
More information about the Pvfs2-users
mailing list