[Pvfs2-developers] Problem with multiple pvfs2 file systems
mounted on a single client
Sam Lang
slang at mcs.anl.gov
Thu Feb 23 17:44:42 EST 2006
On Feb 23, 2006, at 9:35 AM, David Metheny wrote:
> This seems to happen on a 2.6 kernel also. I'm using a 2.6.9-22 on
> a RHEL4
> client.
>
> I also attempted this with the network going away on a pvfs2 server
> node. I
> issued a
> "ifdown eth0 && sleep 200 && ifup eth0" on a pvfs2 server node on the
> /mnt/pvfs2 file system. I went through the same process of issuing
> a "df" on
> the /mnt/pvfs2, getting a connection timed out, then a "df" on the
> /mnt/pvfs2-tmp, and got a connection timed out also. I watched
> (ping) the
> pvfs2 server node where eth0 was brought down, and immediately
> after eth0
> came back up, I issued a "df" on /mnt/pvfs2-tmp again. It worked at
> this
> point.
>
Hi David,
I get a little different behavior. If I create a network partition
between client and server2 nodes, and then do a df -h <mnt1>. I get
an operation timed out error on the first attempt, but repeated
attempts are successful. Also, when I do df -h <mnt2> my error is a
little different. Instead of connection timed-out, I get a Invalid
Argument (EINVAL). Not sure what's up with that. I'll keep looking
into the initial connection timed-out behavior, just wanted to give
you an update.
-sam
>> -----Original Message-----
>> From: David Metheny [mailto:david.metheny at gmail.com]
>> Sent: Thursday, February 23, 2006 8:27 AM
>> To: 'Sam Lang'
>> Cc: 'pvfs2-developers at beowulf-underground.org'
>> Subject: RE: [Pvfs2-developers] Problem with multiple pvfs2
>> file systems mounted on a single client
>>
>> I wasn't able to reproduce the problem by just killing the
>> server process. I tried both killing the server process and
>> powering off the server and the client handled errors from
>> the killing of the server process fine.
>>
>> I was using a 2.4.21-27 kernel on a RHEL3 client... I'll see
>> if I can reproduce on a 2.6 kernel.
>>
>>> -----Original Message-----
>>> From: Sam Lang [mailto:slang at mcs.anl.gov]
>>> Sent: Wednesday, February 22, 2006 4:48 PM
>>> To: david.metheny at gmail.com
>>> Cc: pvfs2-developers at beowulf-underground.org
>>> Subject: Re: [Pvfs2-developers] Problem with multiple pvfs2 file
>>> systems mounted on a single client
>>>
>>>
>>> Hi David,
>>>
>>> I tried to reproduce your results with the 2.6 kernel, and
>> wasn't able
>>> to. Are you using 2.4? Also, I didn't actually pull the
>> plug on one
>>> of the nodes, I just killed the server, but that should be close
>>> enough to your test case unless you're routing stuff
>> through that node
>>> ;-).
>>>
>>> -sam
>>>
>>> On Feb 22, 2006, at 12:16 PM, David Metheny wrote:
>>>
>>>> It appears the error described below will span across
>> other mounted
>>>> file systems on a client when encountered, until the client
>>> software
>>>> is reloaded.
>>>>
>>>>
>>>> I've got a client with 2 pvfs2 file systems mounted:
>>>>
>>>> /mnt/pvfs2
>>>> /mnt/pvfs2-tmp
>>>>
>>>> Both PVFS2 file system configurations contained the following when
>>>> mounted:
>>>> ServerJobBMITimeoutSecs 30
>>>> ServerJobFlowTimeoutSecs 30
>>>> ClientJobBMITimeoutSecs 300
>>>> ClientJobFlowTimeoutSecs 300
>>>> ClientRetryLimit 5
>>>> ClientRetryDelayMilliSecs 2000
>>>>
>>>> I've dynamically changed the clients timeout settings after the
>>>> mounts:
>>>> [root at serenity root]# /sbin/sysctl -w pvfs2.op-timeout-secs=5
>>>>
>>>> A pvfs2 server node lost power on the /mnt/pvfs2 file
>> system. After
>>>> issuing a "df -h /mnt/pvfs2", the client received a "connection
>>>> timed-out"
>>>> error.
>>>>
>>>> [root at serenity root]# df -h /mnt/pvfs2
>>>> Filesystem Size Used Avail Use% Mounted on
>>>> df: `/mnt/pvfs2': Connection timed out
>>>>
>>>> An immediate subsequent "df -h /mnt/pvfs2-tmp" also returned
>>>> "connection timed out"
>>>> [root at serenity root]# df -h /mnt/pvfs2-tmp
>>>> df: `/mnt/pvfs2-tmp': Connection timed out
>>>>
>>>> An unmount of the /mnt/pvfs2 shared works fine.
>>>> [root at serenity root]# umount /mnt/pvfs2
>>>>
>>>> Another subsequent ""df -h /mnt/pvfs2-tmp" still returns
>>> "connection
>>>> timed out"
>>>> [root at serenity root]# df -h /mnt/pvfs2-tmp
>>>> df: `/mnt/pvfs2-tmp': Connection timed out
>>>>
>>>> After unloading the userspace and kernel module, restarting pvfs2
>>>> software, and remounting the /mnt/pvfs2-tmp filesystem, a "df -h
>>>> /mnt/pvfs2-tmp"
>>>> successfully completed
>>>> [root at serenity root]# df -h /mnt/pvfs2-tmp
>>>> Filesystem Size Used Avail Use% Mounted on
>>>> hostname:3334/pvfs2-fs
>>>> 1.9T 381G 1.6T 20% /mnt/pvfs2-tmp
>>>>
>>>>
>>>> The pvfs2 client log contained:
>>>> [E 02/22 11:28] msgpair failed, will retry:: Connection refused [E
>>>> 02/22 11:28] msgpair failed, will retry:: Connection
>>> refused [E 02/22
>>>> 11:28] msgpair failed, will retry:: Connection refused [E
>>> 02/22 11:29]
>>>> msgpair failed, will retry:: Connection refused [E 02/22 11:29]
>>>> msgpair failed, will retry:: Connection refused [E 02/22 11:29]
>>>> msgpair failed, will retry:: Connection refused [E 02/22
>> 11:29] ***
>>>> msgpairarray_completion_fn: msgpair to server
>>>> tcp://hvcwydev0329:3334 failed: Connection refused [E
>> 02/22 11:29]
>>>> *** Out of retries.
>>>> [E 02/22 11:29] Statfs failed: Connection refused [E 02/22 11:36]
>>>> msgpair failed, will retry:: Operation cancelled (possibly due to
>>>> timeout) [E 02/22 11:39] msgpair failed, will retry::
>>> Connection timed
>>>> out [E 02/22 11:42] msgpair failed, will retry:: Connection
>>> timed out
>>>>
>>>> _______________________________________________
>>>> Pvfs2-developers mailing list
>>>> Pvfs2-developers at beowulf-underground.org
>>>>
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>>
>>>
>>
>
More information about the Pvfs2-developers
mailing list