rross at mcs.anl.gov
Mon Jun 12 23:07:31 EDT 2006
i know we're trying to keep the # of DBs down, but would it really hurt
that much to just use a separate DB for this data rather than having to
play funny games with the key strings?
also, it seems a little wacky that we have to pass a flag to tell trove
when to count and when not to count. is there a clean way to avoid that?
how do you read the count?
otherwise i think it's great that we're moving the count
increment/decrement into trove, that this will allow for concurrent
modification, and that we can simplify the state machines.
Sam Lang wrote:
> Hi all,
> The new keyval code currently stores the size of a directory as a
> separate common keyval. The server state machines update this value
> with get/set state actions as needed (in crdirent,rmdirent,etc.). This
> get and set actually prevents us from allowing the create and delete
> operations of different files in the same directory to take place
> concurrently, since the crdirent and rmdirent ops (on the parent dirdata
> handle) get serialized.
> I'd like to fix all this by providing a keyval per handle that contains
> a null string as part of the key (I call it keyval-handle-info). The
> advantage of making it the null string is that it will appear first in
> the lexical ordering of directory entries, so I can skip over it in
> readdir easily. This null keyval would only be created on handles as
> necessary (right now only for counting dirents). The
> TROVE_KEYVAL_HANDLE_COUNT ds flag can be passed to trove operations, for
> example in the case of crdirent, the TROVE_KEYVAL_HANDLE_COUNT and
> TROVE_NOOVERWITE flags would be passed to the trove_keyval_write call
> and specify that the count should be incremented (or created and set to
> 0 if it doesn't exist). rmdirent would do something similar in
> Also, at present the crdirent and rmdirent state machines first do a
> read of the keyval to check for existence. This seems unnecessary.
> Instead, the crdirent sm can just pass TROVE_NOOVERWITE to the
> keyval_write call, and fail if that call fails. rmdirent already fails
> if the keyval_remove fails so the extra keyval_read to check for
> existence seems redundant. Are there any good reasons for those extra
> state actions that I'm missing?
> I've attached a patch of the changes I've described. I would like to
> have this go in to the trunk before the upcoming release, since it
> requires (yet another) storage format change. Let me know if there are
> any questions or concerns.
More information about the Pvfs2-developers