[Pvfs2-users] Timeouts while reading from our pvfs2-system client
kschoche at gmail.com
Tue Jun 12 14:53:49 EDT 2012
Hi Vlad -
Randy had done some work to work around this, I guess I was confused
about what he had actually done because I thought it addressed
something else! at any rate, can you try out the stable branch and
see if the changes help out?
If they dont work we'll start working on it from there.
On Tue, Jun 12, 2012 at 12:19 PM, vlad <vlad at cosy.sbg.ac.at> wrote:
> Hi Kyle!
>> Hi vlad, this is a new one for me, and issues similar rarely occur under
>> relatively low loads like 1GB/s in my experience, are you able to
>> by using pvfs2-cp /input/file /dev/null and specifying a -b to set block
>> If this is what I think it is you shouldn't have any associated
>> on server side, can you verify?
> Yeah, that is and was true...
> this is the new faulty output :
> [root at doppler18 ~]# time /share/apps/orangefs/bin/pvfs2-cp -b 8192k
> /scratchfs/testfile-100GB.dump /dev/null
> [E 17:24:41.214472] Error: encourage_recv_incoming: mop_id 7fdd60000950 in
> RTS_DONE message not found.
> [E 17:24:41.223019] [bt] /share/apps/orangefs/bin/pvfs2-cp(error+0xca)
> [E 17:24:41.223036] [bt] /share/apps/orangefs/bin/pvfs2-cp() [0x465d64]
> [E 17:24:41.223044] [bt] /share/apps/orangefs/bin/pvfs2-cp() [0x467b05]
> [E 17:24:41.223052] [bt]
> /share/apps/orangefs/bin/pvfs2-cp(BMI_testcontext+0xf3) [0x4549c3]
> [E 17:24:41.223060] [bt]
> [E 17:24:41.223068] [bt] /share/apps/orangefs/bin/pvfs2-cp() [0x455aca]
> [E 17:24:41.223075] [bt]
> /share/apps/orangefs/bin/pvfs2-cp(job_testcontext+0x12a) [0x4562ba]
> [E 17:24:41.223082] [bt]
> [E 17:24:41.223090] [bt]
> [E 17:24:41.223098] [bt]
> /share/apps/orangefs/bin/pvfs2-cp(PVFS_sys_io+0xae) [0x420ffe]
> [E 17:24:41.223105] [bt] /share/apps/orangefs/bin/pvfs2-cp(main+0x3b2)
> real 0m13.389s
> user 0m13.001s
> sys 0m0.024s
> That should have been about 100GB ..But
> Tomorrow I will run a test by reducing the block size to 4MB , copying
> that 100GB file .. and post the results again.
> Mind you, a copy of 10GB (blocksize 8MB) went through right now without
> errors though..
> I've forgotten to tell you, that our nodes habe 2xOpterons 6200 with
> 64Gb of memory RAM installed each, so there is some caching involved. Also,
> there are active jobs running at present on my nodes.
>> More info to come once I get into the office.
> Great! Thanks !
>> On Jun 12, 2012 7:52 AM, "vlad" <vlad at cosy.sbg.ac.at> wrote:
>>> We are evaluating orangefs 2.8.6 with QDR-Infiniband on rocks cluster
>>> suite 6.0 (based on CentOS 6.x) and I have set up 8 Nodes (doppler14-20
>>> doppler22). Each node is metaserver, storageserver and client.
>>> Connection is made via ib://doppler18:3335/pvfs2-fs. The file system is
>>> mounted to /scratchfs via the kernel-inteface (pvfs2.ko). Our kernel
>>> version is "2.6.32-220.13.1.el6.x86_64"
>>> We have very impressive transfer rates (with 800-600MB/s) when we dump
>>> very big files (1TB) on the filesystem (dd if=/dev/zero
>>> of=/scratchfs/testfile.dump bs=8192K) , but when reading the dump to
>>> the client-core collapses and our /scratchfs gets inaccessible.
>>> The use of pvfs2fuse does not improve the situation, since we get a
>>> socket error (usually after dumping of 1GB of data, sometimes earlier,
>>> sometimes later ..). The pvfs2fuse-mountpoint gets also inaccessible .
>>> I've found this in one of our client log files:
>>> [E 14:22:23.279365] Error: encourage_recv_incoming: mop_id 7f6ce4000950
>>> RTS_DONE message not found.
>>> [E 14:22:23.292947] [bt] pvfs2-client-core(error+0xca) [0x46f91a]
>>> [E 14:22:23.292978] [bt] pvfs2-client-core() [0x46ccc4]
>>> [E 14:22:23.292999] [bt] pvfs2-client-core() [0x46ea65]
>>> [E 14:22:23.293018] [bt] pvfs2-client-core(BMI_testcontext+0xf3)
>>> [E 14:22:23.293037] [bt]
>>> pvfs2-client-core(PINT_thread_mgr_bmi_push+0x159) [0x4608a9]
>>> [E 14:22:23.293056] [bt] pvfs2-client-core() [0x45c9aa]
>>> [E 14:22:23.293074] [bt] pvfs2-client-core(job_testcontext+0x12a)
>>> [E 14:22:23.293092] [bt]
>>> pvfs2-client-core(PINT_client_state_machine_testsome+0xee) [0x41757e]
>>> [E 14:22:23.293111] [bt] pvfs2-client-core() [0x412ecd]
>>> [E 14:22:23.293130] [bt] pvfs2-client-core(main+0x703) [0x413fb3]
>>> [E 14:22:23.293165] [bt] /lib64/libc.so.6(__libc_start_main+0xfd)
>>> [E 14:22:23.303725] pvfs2-client-core with pid 29108 exited with value
>>> I have ot found any evidence for this error in the server log files
>>> though ..
>>> This is the output of our /etc/pvfs2tab:
>>> "ib://doppler18:3335/pvfs2-fs /scratchfs pvfs2 defaults,noauto 0 0"
>>> Please can you help me to stabilize the read access to our files ?
>>> Greetings from Salzburg/Austria/Europe
>>> Vlad Popa
>>> University of Salzburg
>>> Dept Of Computer Science-HPC Computing
>>> 5020 Salzburg
>>> Tel 0043-662-80446313
>>> mal:vlad at cosy.sbg.ac.at
>>> Pvfs2-users mailing list
>>> Pvfs2-users at beowulf-underground.org
More information about the Pvfs2-users