[PVFS-users] mmargo

Martin Margo mmargo at sdsc.edu
Fri Apr 2 14:08:32 EST 2004


Ron,

I thought about doing that since our cluster is temporary and doing  
benchmarking but it just fails too often that I think it is useless  
without the fix. My ior run actually scale the blocksize over 20 nodes  
to see which blocksize is optimum for writing 1 GB file from each  
client. Then I take that optimum number and scale the I/O until I can  
get as close as I can to the theoritical max speed.

If PVFS crashes on my smallest case, then it will definitely fail for  
the larger ones.

-Martin
On Apr 2, 2004, at 3:34 PM, Ron W. Green wrote:

> you aren't going to like it - we take system time and restart all the  
> iods, pvfsds and mgr.  then it runs OK for a while.
>
> It seems we can restart iods and mgr without killing users.  however,  
> I've seen user jobs die when I restart pvfsclient on the client nodes.
>
> I have an open question as to whether I can get away with just  
> restarting iods and mgr.  If you find out let me know.
>
> ron
>
> Martin Margo wrote:
>
>> Ron,
>>
>> Thanks for the heads up. I thought that the January patch will fix  
>> the  problem, at least some other folks in this list declared that.  
>> What are  your workaround for this problem?
>>
>> -Martin
>> On Apr 2, 2004, at 2:18 PM, Ron W. Green wrote:
>>
>>> we're using 1.6.2 with the january patch and are seeing these  
>>> enqueue  messages.
>>>
>>> if it helps, we have 236 client nodes talking to the one mgr node,  
>>> and  have 6 iod nodes.  Is it possible that we need a much deeper  
>>> queue  depth to accommodate long latencies in talking to the mgr?   
>>> Are the  clients spilling out of their local queues with requests  
>>> waiting on  mgr?  I suspect it may be a scaling issue.
>>>
>>> thanks, I do appreciate the work being done on PVFS.  It is  
>>> improving.
>>>
>>> ron
>>>
>>> Nathan Poznick wrote:
>>>
>>>> _______________________________________________
>>>> PVFS-users mailing list
>>>> PVFS-users at www.beowulf-underground.org
>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs-users
>>>>
>>>>
>>>>
>>>> -------------------------------------------------------------------- 
>>>> -- --
>>>>
>>>> Date:
>>>> Fri, 2 Apr 2004 10:27:54 -0700
>>>>
>>>>
>>>> -------------------------------------------------------------------- 
>>>> -- --
>>>>
>>>> Thus spake Ron W. Green:
>>>>
>>>>> Martin,
>>>>>
>>>>> We seem to get those "failed on enqueue" quite often.  Of course,   
>>>>> our cluster is much bigger too.  I've scratched my head on this,  
>>>>> and  looked at the code.  The best I can tell it is when the pvfs  
>>>>> client  attempts a metadata operation to the mgr node.  I suspect  
>>>>> that the  mgr is slow in responding and/or has run out of queueing  
>>>>> space to  enqueue the metadata operation request (create or stat).
>>>>>
>>>>> Anyone on the list know if mgr has a fixed queue size?  Or can we   
>>>>> jack up the client timeouts?  Multithread mgr?
>>>>>
>>>>> From our testing we're quite convinced the problem lies in mgr,  
>>>>> that  it can't keep up with metadata requests from the clients.
>>>>>
>>>>
>>>> Actually those messages are not referring to any sort of queuing on  
>>>>  the
>>>> manager at all - they refer to the pvfsdev_enqueue/dequeue  
>>>> functions  in
>>>> the kernel module which add/remove messages from the /dev/pvfs-req
>>>> device.
>>>>
>>>>
>>>>
>>>
>>> -- 
>>> Ron W. Green
>>> rwgree at sandia.gov
>>> +1-505-284-1600
>>>
>>> Sr. Engineer, ICC Applications Support
>>>
>>>
>>>
>>> _______________________________________________
>>> PVFS-users mailing list
>>> PVFS-users at www.beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs-users
>>
>>
>
> -- 
> Ron W. Green
> rwgree at sandia.gov
> +1-505-284-1600
>
> Sr. Engineer, ICC Applications Support
>
>
>
> _______________________________________________
> PVFS-users mailing list
> PVFS-users at www.beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs-users



More information about the PVFS-users mailing list