[Pvfs2-users] pvfs2 problem

Sam Lang slang at mcs.anl.gov
Tue Nov 18 11:26:57 EST 2008


Hi Brian,

Sorry for the delayed response!  The likely cause of your errors are  
related to the servers being overloaded by clients, and the I/O  
operations taking so long that the clients cancel them after a timeout  
is reached.  You can crank up the timeouts if you want to perform load  
tests of this kind by modifying the configure options in the PVFS  
config file.  Check out:

http://www.pvfs.org/cvs/pvfs-2-7-branch-docs/doc//pvfs-config-options.php#ClientJobFlowTimeoutSecs
http://www.pvfs.org/cvs/pvfs-2-7-branch-docs/doc//pvfs-config-options.php#ClientJobBMITimeoutSecs

-sam

On Oct 10, 2008, at 10:37 AM, <brain at autistici.org>  
<brain at autistici.org> wrote:

>
> Hello,
>
> I am trying to do some "load tests" with pvfs2, but find the following
> in the logs (I produced them with 'pvfs2-set-debugmask -m /mnt/test
> "network,server,client"'):
>
> Client:
>
> [D 11:34:10.421223] [INFO]: Mapping pointer 0x2b875cb28000 for I/O.
> [D 11:34:10.433532] [INFO]: Mapping pointer 0x6a9000 for I/O.
> [E 11:40:02.941501] job_time_mgr_expire: job time out: cancelling bmi
> operation, job_id: 31963.
>
> Server01:
>
> [D 10/08 11:40] BMI_tcp_post_send_generic: Sent: 24 bytes of data.
> [D 10/08 11:40] [BMI CONTROL]: BMI_set_info: set_info: 7570864  
> option: 6
> [D 10/08 11:40] [BMI CONTROL]: BMI_set_info: searching for ref 7570864
> [D 10/08 11:40] [BMI CONTROL]: BMI_set_info: decremented ref 7570864  
> to: 0
> [D 10/08 11:40] server_state_machine_complete 0x2aaab4022030
> [D 10/08 11:40] server_state_machine_terminate 0x2aaab4022030
> [D 10/08 11:40] Error: bmi_tcp: Connection reset by peer
> [D 10/08 11:40] BMI_testcontext completing: 46912585631680
> [E 10/08 11:40] handle_io_error: flow proto error cleanup started on
> 0x2aaab0008690: Connection reset by peer
> [E 10/08 11:40] handle_io_error: flow proto 0x2aaab0008690 canceled 0
> operations, will clean up.
> [E 10/08 11:40] handle_io_error: flow proto 0x2aaab0008690 error  
> cleanup
> finished: Connection reset by peer
> [D 10/08 11:40] [BMI CONTROL]: BMI_set_info: set_info: 7811296  
> option: 6
> [D 10/08 11:40] [BMI CONTROL]: BMI_set_info: searching for ref 7811296
> [D 10/08 11:40] [BMI CONTROL]: BMI_set_info: decremented ref 7811296  
> to: 0
> [D 10/08 11:40] [BMI CONTROL]: bmi_addr_drop: bmi discarding address:
> 7811296
> [D 10/08 11:40] server_state_machine_complete 0x2aaab40381d0
>
> The cluster configuration is as follows:
> - three hosts with ~400Gb ext3 slice each mounted from a SAN via FC
>   acting as metadata servers, I/O servers and clients;
> - two hosts acting as clients only.
> - Debian 4.0, kernel 2.6.24, pvfs2 module 2.7.1
>
> The hosts are connected to each other by gigabit Ethernet. I am
> mounting the filesystem on each client-only host from a different
> server: is this correct? What is the difference between mounting from
> different servers and using one server for all clients?
>
> Each server/client host instead uses itself as server. Again, would it
> be better to use other hosts as servers?
>
> Last, but not least: have you got any clues on the possible cause of
> the error? I checked all the other logs, and are perfectly clean.  
> Also,
> pvfs2-ping doesn't report anything wrong.
>
> Please forgive me if the above questions have already been answered: I
> tried searching the mailing list archives but without success...
>
>
> Thank you very much for your kind attention!
>
>
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users



More information about the Pvfs2-users mailing list