[Pvfs2-developers] crdirent
Sam Lang
slang at mcs.anl.gov
Tue Jun 13 09:25:55 EDT 2006
On Jun 12, 2006, at 10:07 PM, Rob Ross wrote:
> hey,
>
> i know we're trying to keep the # of DBs down, but would it really
> hurt that much to just use a separate DB for this data rather than
> having to play funny games with the key strings?
I don't have much preference either way. I don't find the null
string to be that much of a hack, but I can see the advantages of
having a separate db for stuff like this. One disadvantage of
separate dbs is that we can't just do one sync at the end of a
crdirent or rmdirent.
>
> also, it seems a little wacky that we have to pass a flag to tell
> trove when to count and when not to count. is there a clean way to
> avoid that?
This is the problem that dbpf doesn't know anything about the common
keys. We could copy the common keys in the dbpf layer, kind of an
ugly hack though. Also, the crdirent and rmdirent calls just give a
handle and the component name, so we really can only tell the
difference between common keys and everything else (!
is_this_a_common_key(key)). In this case that will either be a
component name or an xattr. So we'd only be able to do as good as
counting both xattrs and directory entries.
We talked about just adding the count to every handle in the keyval
db. That adds a bunch of unecessary keyval entries (for each file
and directory). I was trying to avoid that, but maybe the cost isn't
worth the hastle.
>
> how do you read the count?
>
There's an additional trove_keyval_get_handle_info function.
> otherwise i think it's great that we're moving the count increment/
> decrement into trove, that this will allow for concurrent
> modification, and that we can simplify the state machines.
>
> thanks!
>
> rob
>
> Sam Lang wrote:
>> Hi all,
>> The new keyval code currently stores the size of a directory as a
>> separate common keyval. The server state machines update this
>> value with get/set state actions as needed (in
>> crdirent,rmdirent,etc.). This get and set actually prevents us
>> from allowing the create and delete operations of different files
>> in the same directory to take place concurrently, since the
>> crdirent and rmdirent ops (on the parent dirdata handle) get
>> serialized.
>> I'd like to fix all this by providing a keyval per handle that
>> contains a null string as part of the key (I call it keyval-handle-
>> info). The advantage of making it the null string is that it will
>> appear first in the lexical ordering of directory entries, so I
>> can skip over it in readdir easily. This null keyval would only
>> be created on handles as necessary (right now only for counting
>> dirents). The TROVE_KEYVAL_HANDLE_COUNT ds flag can be passed to
>> trove operations, for example in the case of crdirent, the
>> TROVE_KEYVAL_HANDLE_COUNT and TROVE_NOOVERWITE flags would be
>> passed to the trove_keyval_write call and specify that the count
>> should be incremented (or created and set to 0 if it doesn't
>> exist). rmdirent would do something similar in trove_keyval_remove.
>> Also, at present the crdirent and rmdirent state machines first do
>> a read of the keyval to check for existence. This seems
>> unnecessary. Instead, the crdirent sm can just pass
>> TROVE_NOOVERWITE to the keyval_write call, and fail if that call
>> fails. rmdirent already fails if the keyval_remove fails so the
>> extra keyval_read to check for existence seems redundant. Are
>> there any good reasons for those extra state actions that I'm
>> missing?
>> I've attached a patch of the changes I've described. I would like
>> to have this go in to the trunk before the upcoming release, since
>> it requires (yet another) storage format change. Let me know if
>> there are any questions or concerns.
>
More information about the Pvfs2-developers
mailing list