<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 11 (filtered medium)">
<style>
<!--
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman";}
a:link, span.MsoHyperlink
        {color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal;
        font-family:Arial;
        color:windowtext;}
span.EmailStyle18
        {mso-style-type:personal-reply;
        font-family:Arial;
        color:navy;}
@page Section1
        {size:8.5in 11.0in;
        margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
        {page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=EN-US link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>I’m seeing an issue when removing large files from a
PVFS2 file system. My example setup is a 12 node PVFS2 file system with 2.2TB
EXT3 SAN mounts to each pvfs2 server. The server is configured for 30 second
timeouts and 5 retries. We really don’t want to change the timeout values
and retries if possible.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>There is a 2TB file that exists. When the client tries to
‘rm’ the 2TB file, the client basically goes through the 30 second
timeout and exhausts the retries and then reports back to the command line
“Invalid Argument”. From everything I can tell, the file *<b><span
style='font-weight:bold'>really</span></b>* gets deleted and doesn’t show
up in a directory listing. <o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>I’ve included the client command line results and the
log messages from the delete below<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>bash-2.05b$ rm cmsdb_silo_mstr_20080606a<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>rm: cannot remove `cmsdb_silo_mstr_20080606a': Invalid
argument<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:29:35 clientNode1 PVFS2: [E]
job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 192955100.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:29:35 clientNode1 PVFS2: [E]
job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 192955103.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:29:35 clientNode1 PVFS2: [E]
job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 192955106.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:29:35 clientNode1 PVFS2: [E]
job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 192955109.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:29:35 clientNode1 PVFS2: [E]
job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 192955112.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:29:35 clientNode1 PVFS2: [E]
job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 192955115.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:29:35 clientNode1 PVFS2: [E]
job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 192955118.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:29:35 clientNode1 PVFS2: [E]
job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 192955121.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:29:35 clientNode1 PVFS2: [E]
job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 192955124.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:29:35 clientNode1 PVFS2: [E]
job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 192955127.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:29:35 clientNode1 PVFS2: [E]
job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 192955130.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:29:35 clientNode1 PVFS2: [E]
job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 192955133.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:29:35 clientNode1 PVFS2: [E] msgpair failed,
will retry: Operation cancelled (possibly due to timeout)<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:29:36 clientNode1 last message repeated 11
times.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><SKIPPING REPEAT OF THE ABOVE 5 MORE TIMES><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] ***
msgpairarray_completion_fn: msgpair to server tcp://server1HA:3334 failed:
Operation cancelled (possibly due to timeout)<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of
retries.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] ***
msgpairarray_completion_fn: msgpair to server tcp://server2HA:3334 failed:
Operation cancelled (possibly due to timeout)<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of
retries.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] ***
msgpairarray_completion_fn: msgpair to server tcp://server3HA:3334 failed:
Operation cancelled (possibly due to timeout)<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of
retries.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] ***
msgpairarray_completion_fn: msgpair to server tcp://server4HA:3334 failed:
Operation cancelled (possibly due to timeout)<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of
retries.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] ***
msgpairarray_completion_fn: msgpair to server tcp://server5HA:3334 failed:
Operation cancelled (possibly due to timeout)<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of
retries.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] ***
msgpairarray_completion_fn: msgpair to server tcp://server6HA:3334 failed:
Operation cancelled (possibly due to timeout)<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of
retries.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] ***
msgpairarray_completion_fn: msgpair to server tcp://server7HA:3334 failed:
Operation cancelled (possibly due to timeout)<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of
retries.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] ***
msgpairarray_completion_fn: msgpair to server tcp://server8HA:3334 failed:
Operation cancelled (possibly due to timeout)<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of
retries.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] *** msgpairarray_completion_fn:
msgpair to server tcp://server9HA:3334 failed: Operation cancelled (possibly
due to timeout)<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of
retries.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] ***
msgpairarray_completion_fn: msgpair to server tcp://server10HA:3334 failed:
Operation cancelled (possibly due to timeout)<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of
retries.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] ***
msgpairarray_completion_fn: msgpair to server tcp://server11HA:3334 failed:
Operation cancelled (possibly due to timeout)<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of
retries.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] ***
msgpairarray_completion_fn: msgpair to server tcp://server12HA:3334 failed:
Operation cancelled (possibly due to timeout)<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of
retries.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] Error: failed
removing one or more datafiles associated with the meta handle 1610612708<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] WARNING:
PVFS_sys_remove() encountered an error which may lead to inconsistent state:
Operation cancelled (possibly due to timeout)<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 PVFS2: [E] WARNING: PVFS2
fsck (if available) may be needed.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:10 clientNode1 kernel: pvfs2: warning: got
error code without errno equivalent: -1610612865.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:59 clientNode1 PVFS2: [E]
job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 192955696.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:32:59 clientNode1 PVFS2: [E] msgpair failed,
will retry: Operation cancelled (possibly due to timeout)<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:33:29 clientNode1 PVFS2: [E]
job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 192955732.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:33:29 clientNode1 PVFS2: [E] msgpair failed,
will retry: Operation cancelled (possibly due to timeout)<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:33:59 clientNode1 PVFS2: [E]
job_time_mgr_expire: job time out: cancelling bmi operation, job_id: 192955766.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:33:59 clientNode1 PVFS2: [E] msgpair failed,
will retry: Operation cancelled (possibly due to timeout)<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:34:20 clientNode1 PVFS2: [E] Error: failed
removing one or more datafiles associated with the meta handle 1252698765<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:34:20 clientNode1 PVFS2: [E] WARNING:
PVFS_sys_remove() encountered an error which may lead to inconsistent state: No
such file or directory<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Oct 2 10:34:20 clientNode1 PVFS2: [E] WARNING: PVFS2
fsck (if available) may be needed.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
</div>
</body>
</html>