[Pvfs2-developers] patches: bug fixes
Sam Lang
slang at mcs.anl.gov
Thu Jun 8 17:47:02 EDT 2006
These looked ok to me so I went ahead and committed them. Thanks Phil!
-sam
On Jun 7, 2006, at 10:54 AM, Phil Carns wrote:
> All of the patches listed in this email are limited scope- they
> just fix specific bugs and don't make any protocol or storage
> format changes:
>
> pvfs2-kernel-permissions.patch:
> -----------
> The pvfs2_permission() function is significantly different (due to
> #ifdefs) based on whether or not the kernel provides a
> generic_permission() function which was introduced around 2.6.10.
> The code path that gets triggered for kernel versions < 2.6.10 has
> to duplicate a good bit of permission checking functionality, but
> it got a few details wrong. This patch corrects that, based on
> examples from the old vfs_permission() function and how NFS
> implemented ACL support before generic_permission() was available.
> There are a set of about 5 or so LTP file system tests that will
> not pass on 2.6.9 with PVFS2 unless this patch is applied.
>
> sys-remove-crdirent.patch:
> -----------
> There is a corner case in sys-remove in which if an operation
> fails, sys-remove has to replace a directory entry. The msgpair
> setup for this operation was using the wrong handle to determine
> which server to send the request to, however, so the request is
> sent to the server holding the removed object rather than the
> server holding the parent directory. The doesn't trigger a problem
> unless a) you use multiple meta servers and b) you have the
> misfortune of triggering this sys-remove error case.
>
> balance-io-servers.patch:
> -----------
> The PINT_cached_config_next_io() function uses a round robin
> algorithm to pick the next set of I/O servers to use when creating
> files. The logic had a subtle flaw though. If you use all I/O
> servers (as most people do), then the I/O server slection is
> balanced. However, if you configure the file system to use only
> one datafile, it would result in every other server being skipped.
> If the number of total number of servers is even, then the wrap-
> around lines up every time to make it so that half of the servers
> pretty much stay overloaded in terms of I/O traffic. This patch
> corrects the round robin algorithm to make sure that every I/O
> server is used fairly, so that we are balanced regardless of how
> many datafiles are used. I can give a more detailed explanation of
> what went wrong here if needed (I think I am the one that
> introduced the problem).
>
> env-shmem-cleanup.patch:
> -----------
> This patch makes 3 improvements to the berkeley db cache usage:
> - use the port number of the server's host id as a component of the
> shmem key to use (if shmem style caching is in effect). This
> insures that if 2 servers are started on the same server, they will
> not use the same shmem region.
> - on startup, always forcibly remove any old environment remnants
> before opening the environment. This makes sure that if the
> previous server instance died unexpectedly, the leftover __db.*
> files won't interfere with anything. These old files don't serve
> any purpose for the next pvfs2-server run, especially without
> transactions.
> - look for a particular error code on env open that could be an
> indicator that the cache size was too big for the RAM size of the
> machine, and print a warning message.
>
> test-chown-sudo.patch
> -----------
> This fixes a minor bug in how sudo is handled in test programs
> related to the chown() operation.
>
> test-sgid-error-msg.patch
> -----------
> This patch adds a clearer error message if test-setgid fails
> because of bad arguments.
>
> -Phil
> Index: pvfs2_src/src/common/misc/pint-cached-config.c
> ===================================================================
> --- pvfs2_src/src/common/misc/pint-cached-config.c (revision 1923)
> +++ pvfs2_src/src/common/misc/pint-cached-config.c (revision 1924)
> @@ -277,6 +277,7 @@
> }
> else
> {
> + /* we let the jitter loop below increment the
> cursor by one */
> jitter = 0;
> }
> while(jitter-- > -1)
> @@ -338,6 +339,7 @@
> struct qlist_head *hash_link = NULL;
> struct config_fs_cache_s *cur_config_cache = NULL;
> int jitter = 0, num_io_servers = 0;
> + PINT_llist* old_data_server_cursor = NULL;
>
> if (config && num_servers && io_handle_extent_array)
> {
> @@ -353,6 +355,7 @@
> num_io_servers = PINT_llist_count(
> cur_config_cache->fs->data_handle_ranges);
>
> +
> /* pick random starting point, then round robin */
> if(!io_randomized)
> {
> @@ -378,6 +381,7 @@
> cur_config_cache->data_server_cursor =
> PINT_llist_next(
> cur_config_cache->data_server_cursor);
> }
> + old_data_server_cursor = cur_config_cache-
> >data_server_cursor;
>
> while(num_servers)
> {
> @@ -418,6 +422,10 @@
> io_addr_array++;
> }
> ret = ((num_servers == 0) ? 0 : ret);
> + /* reset data server cursor to point to the old
> cursor; the
> + * jitter on the next iteration will increment it by one
> + */
> + cur_config_cache->data_server_cursor =
> old_data_server_cursor;
> }
> }
> return ret;
> diff -Naur pvfs2-old/src/io/trove/trove-dbpf/dbpf-mgmt.c pvfs2/src/
> io/trove/trove-dbpf/dbpf-mgmt.c
> --- pvfs2-old/src/io/trove/trove-dbpf/dbpf-mgmt.c 2006-06-05
> 21:57:27.000000000 +0200
> +++ pvfs2/src/io/trove/trove-dbpf/dbpf-mgmt.c 2006-06-07
> 05:26:42.000000000 +0200
> @@ -45,6 +45,7 @@
> char dbpf_method_name[] = "dbpf";
>
> extern int TROVE_db_cache_size_bytes;
> +extern int TROVE_shm_key_hint;
>
> struct dbpf_storage *my_storage_p = NULL;
> static int db_open_count, db_close_count;
> @@ -63,12 +64,22 @@
> *error = -EINVAL;
> return NULL;
> }
> +
> + /* we start by making sure any old environment remnants are
> cleaned up */
> + ret = db_env_create(&dbenv, 0);
> + if (ret != 0)
> + {
> + gossip_err("dbpf_putdb_env: could not create any
> environment handle: %s\n", db_strerror(ret));
> + return 0;
> + }
> + /* don't check return code here; we don't care if it fails */
> + dbenv->remove(dbenv, path, DB_FORCE);
>
> retry:
> ret = db_env_create(&dbenv, 0);
> if (ret != 0)
> {
> - gossip_lerr("dbpf_getdb_env: %s\n", db_strerror(ret));
> + gossip_err("dbpf_getdb_env: %s\n", db_strerror(ret));
> *error = ret;
> return NULL;
> }
> @@ -99,7 +110,7 @@
> DB_INIT_MPOOL|DB_CREATE|DB_THREAD, 0);
> if(ret != 0)
> {
> - gossip_lerr("dbpf_getdb_env(%s): %s\n", path,
> db_strerror(ret));
> + gossip_err("dbpf_getdb_env(%s): %s\n", path,
> db_strerror(ret));
> *error = ret;
> return NULL;
> }
> @@ -107,11 +118,12 @@
> else
> {
> /* default to using shm style cache */
> -
> - ret = dbenv->set_shm_key(dbenv, 646567223);
> + gossip_debug(GOSSIP_TROVE_DEBUG, "dbpf using shm key: %d\n",
> + (646567223+TROVE_shm_key_hint));
> + ret = dbenv->set_shm_key(dbenv, (646567223
> +TROVE_shm_key_hint));
> if(ret != 0)
> {
> - gossip_lerr("dbenv->set_shm_key(%s): %s\n",
> + gossip_err("dbenv->set_shm_key(%s): %s\n",
> path, db_strerror(ret));
> *error = ret;
> return NULL;
> @@ -144,9 +156,18 @@
> goto retry;
> }
>
> + if(ret == DB_RUNRECOVERY)
> + {
> + gossip_err("dbpf_getdb_env(): DB_RUNRECOVERY on
> environment open.\n");
> + gossip_err(
> + "\n\n"
> + " Please make sure that you have not chosen a
> DBCacheSizeBytes\n"
> + " configuration file value that is too large for
> your machine.\n\n");
> + }
> +
> if(ret != 0)
> {
> - gossip_lerr("dbpf_getdb_env(%s): %s\n", path,
> db_strerror(ret));
> + gossip_err("dbpf_getdb_env(%s): %s\n", path,
> db_strerror(ret));
> *error = ret;
> return NULL;
> }
> diff -Naur pvfs2-old/src/io/trove/trove.c pvfs2/src/io/trove/trove.c
> --- pvfs2-old/src/io/trove/trove.c 2006-05-26 00:17:19.000000000 +0200
> +++ pvfs2/src/io/trove/trove.c 2006-06-07 05:23:53.000000000 +0200
> @@ -30,6 +30,7 @@
> struct PINT_perf_counter* PINT_server_pc = NULL;
>
> int TROVE_db_cache_size_bytes = 0;
> +int TROVE_shm_key_hint = 0;
>
> /** Initiate reading from a contiguous region in a bstream into a
> * contiguous region in memory.
> @@ -930,6 +931,11 @@
> TROVE_db_cache_size_bytes = *((int *)parameter);
> return 0;
> }
> + if(option == TROVE_SHM_KEY_HINT)
> + {
> + TROVE_shm_key_hint = *((int*)parameter);
> + return(0);
> + }
>
> method_id = map_coll_id_to_method(coll_id);
> if (method_id < 0) {
> diff -Naur pvfs2-old/src/io/trove/trove.h pvfs2/src/io/trove/trove.h
> --- pvfs2-old/src/io/trove/trove.h 2006-06-05 21:57:26.000000000 +0200
> +++ pvfs2/src/io/trove/trove.h 2006-06-07 05:24:29.000000000 +0200
> @@ -75,7 +75,8 @@
> TROVE_COLLECTION_ATTR_CACHE_INITIALIZE,
> TROVE_DB_CACHE_SIZE_BYTES,
> TROVE_COLLECTION_COALESCING_HIGH_WATERMARK,
> - TROVE_COLLECTION_COALESCING_LOW_WATERMARK
> + TROVE_COLLECTION_COALESCING_LOW_WATERMARK,
> + TROVE_SHM_KEY_HINT
> };
>
> /** Initializes the Trove layer. Must be called before any other
> Trove
> diff -Naur pvfs2-old/src/server/pvfs2-server.c pvfs2/src/server/
> pvfs2-server.c
> --- pvfs2-old/src/server/pvfs2-server.c 2006-06-05
> 21:57:28.000000000 +0200
> +++ pvfs2/src/server/pvfs2-server.c 2006-06-07 05:23:19.000000000
> +0200
> @@ -155,6 +155,7 @@
> static int create_pidfile(char *pidfile);
> static void write_pidfile(int fd);
> static void remove_pidfile(void);
> +static int parse_port_from_host_id(char* host_id);
>
> /* table of incoming request types and associated parameters */
> struct PINT_server_req_params PINT_server_req_table[] =
> @@ -901,6 +902,7 @@
> char buf[16] = {0};
> PVFS_fs_id orig_fsid;
> PVFS_ds_flags init_flags = 0;
> + int port_num = 0;
>
> /* Initialize distributions */
> ret = PINT_dist_initialize(0);
> @@ -946,6 +948,16 @@
> /* this should never fail */
> assert(ret == 0);
>
> + /* parse port number and allow trove to use it to help
> differentiate
> + * shmem regions if needed
> + */
> + port_num = parse_port_from_host_id(server_config.host_id);
> + if(port_num > 0)
> + {
> + ret = trove_collection_setinfo(0, 0, TROVE_SHM_KEY_HINT,
> &port_num);
> + assert(ret == 0);
> + }
> +
> if(server_config.db_cache_type && (!strcmp
> (server_config.db_cache_type,
> "mmap")))
> {
> @@ -1850,6 +1862,37 @@
> return &server_config;
> }
>
> +/* parse_port_from_host_id()
> + *
> + * attempts to parse the port number from a BMI address. Returns
> port number
> + * on success, -1 on failure
> + */
> +static int parse_port_from_host_id(char* host_id)
> +{
> + int ret = -1;
> + int port_num;
> + char* port_index;
> + char* colon_index;
> +
> + /* see if we have a <proto>://<hostname>:<port> format */
> + port_index = rindex(host_id, ':');
> + colon_index = index(host_id, ':');
> + /* if so, parse the port number */
> + if(port_index && (port_index != colon_index))
> + {
> + port_index++;
> + ret = sscanf(port_index, "%d", &port_num);
> + }
> +
> + /* report error if we don't find a valid port number in the
> string */
> + if(ret != 1)
> + {
> + return(-1);
> + }
> +
> + return(port_num);
> +}
> +
> /*
> * Local variables:
> * c-indent-level: 4
> Index: pvfs2_src/src/kernel/linux-2.6/acl.c
> ===================================================================
> --- pvfs2_src/src/kernel/linux-2.6/acl.c (revision 1653)
> +++ pvfs2_src/src/kernel/linux-2.6/acl.c (revision 1654)
> @@ -567,9 +567,12 @@
> return -EACCES;
> }
> /* Do permission checks only for lookups and opens */
> - if (!nd || !(nd->flags & (LOOKUP_OPEN | LOOKUP_ACCESS)))
> - {
> - return 0;
> + if ((mask & MAY_EXEC) == 0) {
> + if (!nd || !(nd->flags & (LOOKUP_OPEN | LOOKUP_ACCESS)))
> + {
> + pvfs2_print("pvfs2_permission: returning OK when
> checking for lookups and opens access!\n");
> + return 0;
> + }
> }
> if (current->fsuid == inode->i_uid)
> {
> @@ -608,18 +611,25 @@
> }
> pvfs2_print("pvfs2_permission: mode & mask & S_IRWXO = %d,
> mask = %d\n", mode & mask & S_IRWXO, mask);
> if ((mode & mask & S_IRWXO) == mask)
> + {
> + pvfs2_print("pvfs2_permission: returning zero after
> node&mask&S_IRWXO check!\n");
> return 0;
> + }
> check_capabilities:
> /* Are we allowed to override DAC */
> if ((mask & (MAY_READ|MAY_WRITE)) || (inode->i_mode & S_IXUGO))
> {
> if (capable(CAP_DAC_OVERRIDE))
> + {
> + pvfs2_print("pvfs2_permission: returning zero after
> mask & MAY_READ|MAYWRITE %d OR inode->i_mode & S_IXUGO %d !\n",mask
> & (MAY_READ|MAY_WRITE), inode->i_mode & S_IXUGO);
> return 0;
> + }
> }
> /* Allow read/search if CAP_DAC_READ_SEARCH */
> if (capable(CAP_DAC_READ_SEARCH) &&
> ((mask == MAY_READ) || (S_ISDIR(inode->i_mode) && !
> (mask && MAY_WRITE))))
> {
> + pvfs2_print("pvfs2_permission: returningg zero after
> capable(CAP_DAC_READ_SEARCH) %d && (mask == MAY_READ) %d || (S_ISDIR
> (inode->i_mode) %d && !(mask && MAY_WRITE) %d\n",capable
> (CAP_DAC_READ_SEARCH),mask == MAY_READ,S_ISDIR(inode->i_mode),!
> (mask && MAY_WRITE));
> return 0;
> }
> pvfs2_print("pvfs2_permission: disallowing access\n");
> Index: pvfs2_src/src/kernel/linux-2.6/acl.c
> ===================================================================
> --- pvfs2_src/src/kernel/linux-2.6/acl.c (revision 1718)
> +++ pvfs2_src/src/kernel/linux-2.6/acl.c (revision 1719)
> @@ -570,7 +570,6 @@
> if ((mask & MAY_EXEC) == 0) {
> if (!nd || !(nd->flags & (LOOKUP_OPEN | LOOKUP_ACCESS)))
> {
> - pvfs2_print("pvfs2_permission: returning OK when
> checking for lookups and opens access!\n");
> return 0;
> }
> }
> @@ -609,29 +608,20 @@
> if (in_group_p(inode->i_gid))
> mode >>= 3;
> }
> - pvfs2_print("pvfs2_permission: mode & mask & S_IRWXO = %d,
> mask = %d\n", mode & mask & S_IRWXO, mask);
> if ((mode & mask & S_IRWXO) == mask)
> {
> - pvfs2_print("pvfs2_permission: returning zero after
> node&mask&S_IRWXO check!\n");
> return 0;
> }
> check_capabilities:
> /* Are we allowed to override DAC */
> - if ((mask & (MAY_READ|MAY_WRITE)) || (inode->i_mode & S_IXUGO))
> + if(!(mask & MAY_EXEC) || (inode->i_mode & S_IXUGO) || S_ISDIR
> (inode->i_mode))
> {
> - if (capable(CAP_DAC_OVERRIDE))
> + if(capable(CAP_DAC_OVERRIDE))
> {
> - pvfs2_print("pvfs2_permission: returning zero after
> mask & MAY_READ|MAYWRITE %d OR inode->i_mode & S_IXUGO %d !\n",mask
> & (MAY_READ|MAY_WRITE), inode->i_mode & S_IXUGO);
> return 0;
> }
> }
> - /* Allow read/search if CAP_DAC_READ_SEARCH */
> - if (capable(CAP_DAC_READ_SEARCH) &&
> - ((mask == MAY_READ) || (S_ISDIR(inode->i_mode) && !
> (mask && MAY_WRITE))))
> - {
> - pvfs2_print("pvfs2_permission: returningg zero after
> capable(CAP_DAC_READ_SEARCH) %d && (mask == MAY_READ) %d || (S_ISDIR
> (inode->i_mode) %d && !(mask && MAY_WRITE) %d\n",capable
> (CAP_DAC_READ_SEARCH),mask == MAY_READ,S_ISDIR(inode->i_mode),!
> (mask && MAY_WRITE));
> - return 0;
> - }
> +
> pvfs2_print("pvfs2_permission: disallowing access\n");
> return -EACCES;
> #endif
> Index: pvfs2_src/src/kernel/linux-2.6/acl.c
> ===================================================================
> --- pvfs2_src/src/kernel/linux-2.6/acl.c (revision 1728)
> +++ pvfs2_src/src/kernel/linux-2.6/acl.c (revision 1729)
> @@ -567,12 +567,14 @@
> return -EACCES;
> }
> /* Do permission checks only for lookups and opens */
> +/*
> if ((mask & MAY_EXEC) == 0) {
> if (!nd || !(nd->flags & (LOOKUP_OPEN | LOOKUP_ACCESS)))
> {
> return 0;
> }
> }
> +*/
> if (current->fsuid == inode->i_uid)
> {
> mode >>= 6;
> Index: pvfs2_src/src/client/sysint/sys-remove.sm
> ===================================================================
> --- pvfs2_src/src/client/sysint/sys-remove.sm (revision 1727)
> +++ pvfs2_src/src/client/sysint/sys-remove.sm (revision 1728)
> @@ -356,7 +356,7 @@
> sm_p->parent_ref.fs_id);
>
> msg_p->fs_id = sm_p->parent_ref.fs_id;
> - msg_p->handle = sm_p->object_ref.handle;
> + msg_p->handle = sm_p->parent_ref.handle;
> msg_p->retry_flag = PVFS_MSGPAIR_NO_RETRY;
> msg_p->comp_fn = remove_crdirent_comp_fn;
>
> @@ -547,7 +547,7 @@
> sm_p->u.remove.object_name);
>
> PRINT_REMOVE_WARNING();
> - js_p->error_code = 0;
> + js_p->error_code = -PVFS_ENOENT;
> return 1;
> }
>
> Index: pvfs2_src/test/shared/test-common.c
> ===================================================================
> --- pvfs2_src/test/shared/test-common.c (revision 1729)
> +++ pvfs2_src/test/shared/test-common.c (revision 1730)
> @@ -819,8 +819,9 @@
> }
> else
> {
> +
> /* Determine if we need to use sudo to change owner/group */
> - if(geteuid() != owner_id ||
> + if(geteuid() != owner_id &&
> geteuid() != group_id)
> {
> if(verbose)
> Index: pvfs2_src/test/client/vfs/test-setgid.c
> ===================================================================
> --- pvfs2_src/test/client/vfs/test-setgid.c (revision 1830)
> +++ pvfs2_src/test/client/vfs/test-setgid.c (revision 1831)
> @@ -464,7 +464,6 @@
>
> if(ret != TEST_COMMON_SUCCESS)
> {
> - print_error("Unable to process arguments\n");
> usage();
> return(-1);
> }
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
More information about the Pvfs2-developers
mailing list