[Pvfs2-users] Problems with xattr settings with long arguments

Tony Kew tonykew at ccr.buffalo.edu
Mon Oct 27 10:38:31 EST 2008


Dear Phil,

That fixed it - I successfully ran a 64 node PBS job with a built on the
fly PVFS filesystem & ran a 64 node iozone job against the filesystem

The test dir was set up with 16K strips on each node for a 1024K stripe.
iozone set up to write in 1024K blocks.  The job took about an hour and
five minutes to run...

Initial write 668,321.05 KB/sec
      Rewrite 677,999.64 KB/sec
         Read 300,937.30 KB/sec
      Re-read 313,756.56 KB/sec

3 of the 64 nodes showed some timeouts in their server longs from the
job, but I believe these are non fatal, e.g.:

###############################################
hostname: c17n29.ccr.buffalo.edu
###############################################
killing the PVFS2 server for PBS job: 1093826.bono.ccr.buffalo.edu
-----------------------------------------------
Server log file contents:
-----------------------------------------------
[E 10/24 17:25] job_time_mgr_expire: job time out: cancelling flow 
operation, jo
b_id: 3679067.
[E 10/24 17:25] job_time_mgr_expire: job time out: cancelling flow 
operation, jo
b_id: 3680783.
[E 10/24 17:25] fp_multiqueue_cancel: flow proto cancel called on 
0x2a957f7b80
[E 10/24 17:25] handle_io_error: flow proto error cleanup started on 
0x2a957f7b8
0: Operation cancelled (possibly due to timeout)
[E 10/24 17:25] handle_io_error: flow proto 0x2a957f7b80 canceled 1 
operations,
will clean up.
[E 10/24 17:25] handle_io_error: flow proto 0x2a957f7b80 error cleanup 
finished:
 Operation cancelled (possibly due to timeout)
[E 10/24 17:42] job_time_mgr_expire: job time out: cancelling flow 
operation, jo
b_id: 4959046.
[E 10/24 18:04]
PVFS2 server got signal 15 (server_status_flag: 262143)
-----------------------------------------------

I haven't tried setting <StorageHints> TroveMethod alt-aio
or increasing "ServerJobFlowTimeoutSecs" yet as you suggested to Brian yet.
In the latter case, the pvfs2-genconfig option:  --server-job-timeout 300
sets both "ServerJobFlowTimeoutSecs" and "ServerJobBMITimeoutSecs" to 300
presumably this is what you want ?

Incidentally, what does the "alt-io" option do?

Thanks Much,
Tony

Tony Kew
SAN Administrator
The Center for Computational Research
New York State Center of Excellence
 in Bioinformatics & Life Sciences
701 Ellicott Street, Buffalo, NY 14203

CoE Office: (716) 881-8930           Fax: (716) 849-6656
CSE Office: (716) 645-3797 x2174
      Cell: (716) 560-0910          Home: (716) 874-2126

"I love deadlines, I love the whooshing noise they make as they go by."
                                                          Douglas Adams



Phil Carns wrote:
> Whoops, thanks for catching that.  This additional patch should fix it.
>
> thanks,
> -Phil
>
> Tony Kew wrote:
>> Dear Phil,
>>
>> The patch works under Red Hat Enterprise Linux 5.2, but not under
>> RHEL 4 update 5, which doesn't have DB_BUFFER_SMALL in
>> /usr/include/db4/db.h
>>
>> [...]
>>
>> Tony Kew
>> SAN Administrator
>> The Center for Computational Research
>> New York State Center of Excellence
>> in Bioinformatics & Life Sciences
>> 701 Ellicott Street, Buffalo, NY 14203
>>
>> CoE Office: (716) 881-8930           Fax: (716) 849-6656
>> CSE Office: (716) 645-3797 x2174
>>      Cell: (716) 560-0910          Home: (716) 874-2126
>>
>> "I love deadlines, I love the whooshing noise they make as they go by."
>>                                                          Douglas Adams
>>
>>
>>
>> Tony Kew wrote:
>>> Dear Phil,
>>>
>>> The patch looks good - I can set a 64 node config now:
>>> e.g.
>>>
>>> ramones$ setfattr -n "user.pvfs2.dist_name" -v "varstrip_dist" test
>>> ramones$ setfattr -n "user.pvfs2.dist_params" -v 
>>> "strips:0:16K;1:16K;2:16K;3:16K;4:16K;5:16K;6:16K;7:16K;8:16K;9:16K;10:16K;11:16K;12:16K;13:16K;14:16K;15:16K;16:16K;17:16K;18:16K;19:16K;20:16K;21:16K;22:16K;23:16K;24:16K;25:16K;26:16K;27:16K;28:16K;29:16K;30:16K;31:16K;32:16K;33:16K;34:16K;35:16K;36:16K;37:16K;38:16K;39:16K;40:16K;41:16K;42:16K;43:16K;44:16K;45:16K;46:16K;47:16K;48:16K;49:16K;50:16K;51:16K;52:16K;53:16K;54:16K;55:16K;56:16K;57:16K;58:16K;59:16K;60:16K;61:16K;62:16K;63:16K" 
>>> test
>>> ramones$ getfattr -n "user.pvfs2.dist_params" test
>>> # file: test
>>> user.pvfs2.dist_params="strips:0:16K;1:16K;2:16K;3:16K;4:16K;5:16K;6:16K;7:16K;8:16K;9:16K;10:16K;11:16K;12:16K;13:16K;14:16K;15:16K;16:16K;17:16K;18:16K;19:16K;20:16K;21:16K;22:16K;23:16K;24:16K;25:16K;26:16K;27:16K;28:16K;29:16K;30:16K;31:16K;32:16K;33:16K;34:16K;35:16K;36:16K;37:16K;38:16K;39:16K;40:16K;41:16K;42:16K;43:16K;44:16K;45:16K;46:16K;47:16K;48:16K;49:16K;50:16K;51:16K;52:16K;53:16K;54:16K;55:16K;56:16K;57:16K;58:16K;59:16K;60:16K;61:16K;62:16K;63:16K" 
>>>
>>>
>>> ramones$
>>>
>>> It may take a little while till I can install this on the cluster & 
>>> test PVFSv2
>>> over 64 nodes, but at least the parameter can be set :-)
>>>
>>> Thanks,
>>> Tony
>>>
>>> Tony Kew
>>> SAN Administrator
>>> The Center for Computational Research
>>> New York State Center of Excellence
>>> in Bioinformatics & Life Sciences
>>> 701 Ellicott Street, Buffalo, NY 14203
>>>
>>> CoE Office: (716) 881-8930           Fax: (716) 849-6656
>>> CSE Office: (716) 645-3797 x2174
>>>      Cell: (716) 560-0910          Home: (716) 874-2126
>>>
>>> "I love deadlines, I love the whooshing noise they make as they go by."
>>>                                                          Douglas Adams
>>>
>>> [...]


More information about the Pvfs2-users mailing list