[Pvfs2-developers] PVFS2 Removal of large files
Phil Carns
carns at mcs.anl.gov
Fri Oct 3 14:53:55 EDT 2008
It sounds like there are really two problems in this case:
a) the remove takes a long time
b) trove won't work on other metadata operations until the remove completes
I don't know if this is a good idea or not, but here is a potential
quick way to make the remove work in the background:
Right now the actual work happens in an DBPF_UNLINK() macro, which is
just defined to the unlink() function right now. That could be replaced
with a new function that does something like this:
- rename file
- create thread to do unlink of renamed file
- return
That would probably take care of both a) and b), but at the tradeoff
that the space wouldn't necessarily show up as free (in df or whatever)
immediately after the remove completes at the command line. Trove
initialization and/or fsck would need to be updated to clean renamed
files if a server died before completing the unlink thread.
Another option would be to solve b) by changing trove so that it has
multiple worker threads for metadata operations, but that seems like a
lot of work for this particular problem, and still wouldn't actually
solve a).
As Emmanuel Florac suggested it would make sense to tackle the problem
at the underlying file system level too, but I have a feeling that
option isn't on the table for you guys if you are using RHEL4.
-Phil
David Metheny wrote:
> I’ve run a couple tests just to get an idea of what it might take to
> delete large data files (bstreams).
>
> * From an idle RHEL4 U6 i386 server, with local EXT3, it seems to
> take around 120 seconds to complete a delete of a 100GB file.
> * From an idle RHEL4 U6 i386 server, with SAN attached storage, it
> seems to take around [180] seconds to complete a removal of a
> 200GB file.
>
>
>
> Some other things that I’ve noticed, is that during a remove of large
> files like this, other metadata operations (i.e ‘ls’), as well as I/O
> operations. These operations are also starting to timeout and retry
> operations, and depending on the timeframe, cancel as well.
>
>
>
> What are you thinking along the lines of tuning the client side? I’m
> assuming this would require the remove to complete at least once, either
> with a valid object delete, or recognizing the object isn’t there
> anymore. Would this tuning also take into account that the delete might
> take longer than the client timeout/retry settings?
>
>
>
> We’ve bounced ideas around in the past about lazy deletes and such,
> allowing the delete to occur, but not hold up other trove operations.
> Any suggestions here?
>
>
>
>
>
> ------------------------------------------------------------------------
>
> *From:* Rob Ross [mailto:rross at mcs.anl.gov]
> *Sent:* Thursday, October 02, 2008 3:51 PM
> *To:* david.metheny at gmail.com
> *Cc:* <pvfs2-developers at beowulf-underground.org>
> *Subject:* Re: [Pvfs2-developers] PVFS2 Removal of large files
>
>
>
> Maybe removing the 2TByte file takes longer than 30 seconds on ext3, so
> client times out. It would be useful to know when the server first
> succeeds. Maybe some tuning on client side to catch the case where on
> retry the objects aren't there?
>
> -- Rob
>
>
> On Oct 2, 2008, at 3:21 PM, "David Metheny" <david.metheny at gmail.com
> <mailto:david.metheny at gmail.com>> wrote:
>
>> I’m seeing an issue when removing large files from a PVFS2 file
>> system. My example setup is a 12 node PVFS2 file system with 2.2TB
>> EXT3 SAN mounts to each pvfs2 server. The server is configured for 30
>> second timeouts and 5 retries. We really don’t want to change the
>> timeout values and retries if possible.
>>
>>
>>
>> There is a 2TB file that exists. When the client tries to ‘rm’ the 2TB
>> file, the client basically goes through the 30 second timeout and
>> exhausts the retries and then reports back to the command line
>> “Invalid Argument”. From everything I can tell, the file **really**
>> gets deleted and doesn’t show up in a directory listing.
>>
>>
>>
>> I’ve included the client command line results and the log messages
>> from the delete below
>>
>>
>>
>> bash-2.05b$ rm cmsdb_silo_mstr_20080606a
>>
>> rm: cannot remove `cmsdb_silo_mstr_20080606a': Invalid argument
>>
>>
>>
>> Oct 2 10:29:35 clientNode1 PVFS2: [E] job_time_mgr_expire: job time
>> out: cancelling bmi operation, job_id: 192955100.
>>
>> Oct 2 10:29:35 clientNode1 PVFS2: [E] job_time_mgr_expire: job time
>> out: cancelling bmi operation, job_id: 192955103.
>>
>> Oct 2 10:29:35 clientNode1 PVFS2: [E] job_time_mgr_expire: job time
>> out: cancelling bmi operation, job_id: 192955106.
>>
>> Oct 2 10:29:35 clientNode1 PVFS2: [E] job_time_mgr_expire: job time
>> out: cancelling bmi operation, job_id: 192955109.
>>
>> Oct 2 10:29:35 clientNode1 PVFS2: [E] job_time_mgr_expire: job time
>> out: cancelling bmi operation, job_id: 192955112.
>>
>> Oct 2 10:29:35 clientNode1 PVFS2: [E] job_time_mgr_expire: job time
>> out: cancelling bmi operation, job_id: 192955115.
>>
>> Oct 2 10:29:35 clientNode1 PVFS2: [E] job_time_mgr_expire: job time
>> out: cancelling bmi operation, job_id: 192955118.
>>
>> Oct 2 10:29:35 clientNode1 PVFS2: [E] job_time_mgr_expire: job time
>> out: cancelling bmi operation, job_id: 192955121.
>>
>> Oct 2 10:29:35 clientNode1 PVFS2: [E] job_time_mgr_expire: job time
>> out: cancelling bmi operation, job_id: 192955124.
>>
>> Oct 2 10:29:35 clientNode1 PVFS2: [E] job_time_mgr_expire: job time
>> out: cancelling bmi operation, job_id: 192955127.
>>
>> Oct 2 10:29:35 clientNode1 PVFS2: [E] job_time_mgr_expire: job time
>> out: cancelling bmi operation, job_id: 192955130.
>>
>> Oct 2 10:29:35 clientNode1 PVFS2: [E] job_time_mgr_expire: job time
>> out: cancelling bmi operation, job_id: 192955133.
>>
>> Oct 2 10:29:35 clientNode1 PVFS2: [E] msgpair failed, will retry:
>> Operation cancelled (possibly due to timeout)
>>
>> Oct 2 10:29:36 clientNode1 last message repeated 11 times.
>>
>>
>>
>> <SKIPPING REPEAT OF THE ABOVE 5 MORE TIMES>
>>
>>
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** msgpairarray_completion_fn:
>> msgpair to server tcp://server1HA:3334 failed: Operation cancelled
>> (possibly due to timeout)
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of retries.
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** msgpairarray_completion_fn:
>> msgpair to server tcp://server2HA:3334 failed: Operation cancelled
>> (possibly due to timeout)
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of retries.
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** msgpairarray_completion_fn:
>> msgpair to server tcp://server3HA:3334 failed: Operation cancelled
>> (possibly due to timeout)
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of retries.
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** msgpairarray_completion_fn:
>> msgpair to server tcp://server4HA:3334 failed: Operation cancelled
>> (possibly due to timeout)
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of retries.
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** msgpairarray_completion_fn:
>> msgpair to server tcp://server5HA:3334 failed: Operation cancelled
>> (possibly due to timeout)
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of retries.
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** msgpairarray_completion_fn:
>> msgpair to server tcp://server6HA:3334 failed: Operation cancelled
>> (possibly due to timeout)
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of retries.
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** msgpairarray_completion_fn:
>> msgpair to server tcp://server7HA:3334 failed: Operation cancelled
>> (possibly due to timeout)
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of retries.
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** msgpairarray_completion_fn:
>> msgpair to server tcp://server8HA:3334 failed: Operation cancelled
>> (possibly due to timeout)
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of retries.
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** msgpairarray_completion_fn:
>> msgpair to server tcp://server9HA:3334 failed: Operation cancelled
>> (possibly due to timeout)
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of retries.
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** msgpairarray_completion_fn:
>> msgpair to server tcp://server10HA:3334 failed: Operation cancelled
>> (possibly due to timeout)
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of retries.
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** msgpairarray_completion_fn:
>> msgpair to server tcp://server11HA:3334 failed: Operation cancelled
>> (possibly due to timeout)
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of retries.
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** msgpairarray_completion_fn:
>> msgpair to server tcp://server12HA:3334 failed: Operation cancelled
>> (possibly due to timeout)
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] *** Out of retries.
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] Error: failed removing one or
>> more datafiles associated with the meta handle 1610612708
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] WARNING: PVFS_sys_remove()
>> encountered an error which may lead to inconsistent state: Operation
>> cancelled (possibly due to timeout)
>>
>> Oct 2 10:32:10 clientNode1 PVFS2: [E] WARNING: PVFS2 fsck (if
>> available) may be needed.
>>
>> Oct 2 10:32:10 clientNode1 kernel: pvfs2: warning: got error code
>> without errno equivalent: -1610612865.
>>
>> Oct 2 10:32:59 clientNode1 PVFS2: [E] job_time_mgr_expire: job time
>> out: cancelling bmi operation, job_id: 192955696.
>>
>> Oct 2 10:32:59 clientNode1 PVFS2: [E] msgpair failed, will retry:
>> Operation cancelled (possibly due to timeout)
>>
>> Oct 2 10:33:29 clientNode1 PVFS2: [E] job_time_mgr_expire: job time
>> out: cancelling bmi operation, job_id: 192955732.
>>
>> Oct 2 10:33:29 clientNode1 PVFS2: [E] msgpair failed, will retry:
>> Operation cancelled (possibly due to timeout)
>>
>> Oct 2 10:33:59 clientNode1 PVFS2: [E] job_time_mgr_expire: job time
>> out: cancelling bmi operation, job_id: 192955766.
>>
>> Oct 2 10:33:59 clientNode1 PVFS2: [E] msgpair failed, will retry:
>> Operation cancelled (possibly due to timeout)
>>
>> Oct 2 10:34:20 clientNode1 PVFS2: [E] Error: failed removing one or
>> more datafiles associated with the meta handle 1252698765
>>
>> Oct 2 10:34:20 clientNode1 PVFS2: [E] WARNING: PVFS_sys_remove()
>> encountered an error which may lead to inconsistent state: No such
>> file or directory
>>
>> Oct 2 10:34:20 clientNode1 PVFS2: [E] WARNING: PVFS2 fsck (if
>> available) may be needed.
>>
>>
>>
>>
>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> <mailto:Pvfs2-developers at beowulf-underground.org>
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
More information about the Pvfs2-developers
mailing list