[Pvfs2-developers] Problem with multiple pvfs2 file systems mounted on a single client

Sam Lang slang at mcs.anl.gov
Thu Feb 23 17:44:42 EST 2006


On Feb 23, 2006, at 9:35 AM, David Metheny wrote:

> This seems to happen on a 2.6 kernel also. I'm using a 2.6.9-22 on  
> a RHEL4
> client.
>
> I also attempted this with the network going away on a pvfs2 server  
> node. I
> issued a
> "ifdown eth0 && sleep 200 && ifup eth0" on a pvfs2 server node on the
> /mnt/pvfs2 file system. I went through the same process of issuing  
> a "df" on
> the /mnt/pvfs2, getting a connection timed out, then a "df" on the
> /mnt/pvfs2-tmp, and got a connection timed out also. I watched  
> (ping) the
> pvfs2 server node where eth0 was brought down, and immediately  
> after eth0
> came back up, I issued a "df" on /mnt/pvfs2-tmp again. It worked at  
> this
> point.
>

Hi David,

I get a little different behavior.  If I create a network partition  
between client and server2 nodes, and then do a df -h <mnt1>.  I get  
an operation timed out error on the first attempt, but repeated  
attempts are successful.  Also, when I do df -h <mnt2> my error is a  
little different.  Instead of connection timed-out, I get a Invalid  
Argument (EINVAL).  Not sure what's up with that.  I'll keep looking  
into the initial connection timed-out behavior, just wanted to give  
you an update.

-sam

>> -----Original Message-----
>> From: David Metheny [mailto:david.metheny at gmail.com]
>> Sent: Thursday, February 23, 2006 8:27 AM
>> To: 'Sam Lang'
>> Cc: 'pvfs2-developers at beowulf-underground.org'
>> Subject: RE: [Pvfs2-developers] Problem with multiple pvfs2
>> file systems mounted on a single client
>>
>> I wasn't able to reproduce the problem by just killing the
>> server process. I tried both killing the server process and
>> powering off the server and the client handled errors from
>> the killing of the server process fine.
>>
>> I was using a 2.4.21-27 kernel on a RHEL3 client... I'll see
>> if I can reproduce on a 2.6 kernel.
>>
>>> -----Original Message-----
>>> From: Sam Lang [mailto:slang at mcs.anl.gov]
>>> Sent: Wednesday, February 22, 2006 4:48 PM
>>> To: david.metheny at gmail.com
>>> Cc: pvfs2-developers at beowulf-underground.org
>>> Subject: Re: [Pvfs2-developers] Problem with multiple pvfs2 file
>>> systems mounted on a single client
>>>
>>>
>>> Hi David,
>>>
>>> I tried to reproduce your results with the 2.6 kernel, and
>> wasn't able
>>> to.  Are you using 2.4?  Also, I didn't actually pull the
>> plug on one
>>> of the nodes, I just killed the server, but that should be close
>>> enough to your test case unless you're routing stuff
>> through that node
>>> ;-).
>>>
>>> -sam
>>>
>>> On Feb 22, 2006, at 12:16 PM, David Metheny wrote:
>>>
>>>> It appears the error described below will span across
>> other mounted
>>>> file systems on a client when encountered, until the client
>>> software
>>>> is reloaded.
>>>>
>>>>
>>>> I've got a client with 2 pvfs2 file systems mounted:
>>>>
>>>> 	/mnt/pvfs2
>>>> 	/mnt/pvfs2-tmp
>>>>
>>>> Both PVFS2 file system configurations contained the following when
>>>> mounted:
>>>>         ServerJobBMITimeoutSecs 30
>>>>         ServerJobFlowTimeoutSecs 30
>>>>         ClientJobBMITimeoutSecs 300
>>>>         ClientJobFlowTimeoutSecs 300
>>>>         ClientRetryLimit 5
>>>>         ClientRetryDelayMilliSecs 2000
>>>>
>>>> I've dynamically changed the clients timeout settings after the
>>>> mounts:
>>>> 	[root at serenity root]# /sbin/sysctl -w pvfs2.op-timeout-secs=5
>>>>
>>>> A pvfs2 server node lost power on the /mnt/pvfs2 file
>> system. After
>>>> issuing a "df -h /mnt/pvfs2", the client received a "connection
>>>> timed-out"
>>>> error.
>>>>
>>>> 	[root at serenity root]# df -h /mnt/pvfs2
>>>> 	Filesystem            Size  Used Avail Use% Mounted on
>>>> 	df: `/mnt/pvfs2': Connection timed out
>>>>
>>>> An immediate subsequent "df -h /mnt/pvfs2-tmp" also returned
>>>> "connection timed out"
>>>> 	[root at serenity root]# df -h /mnt/pvfs2-tmp
>>>> 	df: `/mnt/pvfs2-tmp': Connection timed out
>>>>
>>>> An unmount of the /mnt/pvfs2 shared works fine.
>>>> 	[root at serenity root]# umount /mnt/pvfs2
>>>>
>>>> Another subsequent ""df -h /mnt/pvfs2-tmp" still returns
>>> "connection
>>>> timed out"
>>>> 	[root at serenity root]# df -h /mnt/pvfs2-tmp
>>>> 	df: `/mnt/pvfs2-tmp': Connection timed out
>>>>
>>>> After unloading the userspace and kernel module, restarting pvfs2
>>>> software, and remounting the /mnt/pvfs2-tmp filesystem, a "df -h
>>>> /mnt/pvfs2-tmp"
>>>> successfully completed
>>>> [root at serenity root]# df -h /mnt/pvfs2-tmp
>>>> Filesystem            Size  Used Avail Use% Mounted on
>>>> hostname:3334/pvfs2-fs
>>>>                       1.9T  381G  1.6T  20% /mnt/pvfs2-tmp
>>>>
>>>>
>>>> The pvfs2 client log contained:
>>>> [E 02/22 11:28] msgpair failed, will retry:: Connection refused [E
>>>> 02/22 11:28] msgpair failed, will retry:: Connection
>>> refused [E 02/22
>>>> 11:28] msgpair failed, will retry:: Connection refused [E
>>> 02/22 11:29]
>>>> msgpair failed, will retry:: Connection refused [E 02/22 11:29]
>>>> msgpair failed, will retry:: Connection refused [E 02/22 11:29]
>>>> msgpair failed, will retry:: Connection refused [E 02/22
>> 11:29] ***
>>>> msgpairarray_completion_fn: msgpair to server
>>>> tcp://hvcwydev0329:3334 failed: Connection  refused [E
>> 02/22 11:29]
>>>> *** Out of retries.
>>>> [E 02/22 11:29] Statfs failed: Connection refused [E 02/22 11:36]
>>>> msgpair failed, will retry:: Operation cancelled (possibly due to
>>>> timeout) [E 02/22 11:39] msgpair failed, will retry::
>>> Connection timed
>>>> out [E 02/22 11:42] msgpair failed, will retry:: Connection
>>> timed out
>>>>
>>>> _______________________________________________
>>>> Pvfs2-developers mailing list
>>>> Pvfs2-developers at beowulf-underground.org
>>>>
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>>
>>>
>>
>



More information about the Pvfs2-developers mailing list