[Pvfs2-developers] patches: misc. bug fixes

Sam Lang slang at mcs.anl.gov
Thu Nov 8 16:48:55 EST 2007


Hi Phil,

Thanks for the patches.  I committed your patches, except for the  
last one, I think we want to be able to keep doing crdirents  
concurrently.  Also, I made a minor mod to the client-buffer-logging,  
by adding a [INFO]: header to the message written to the client log.   
Hopefully that will prevent users wondering if something failed  
because a message appeared in the log.

-sam


On Oct 3, 2007, at 8:13 AM, Phil Carns wrote:

> pvfs2-aio-cancel.patch
> ----------------------
> This patch fixes a bug in the I/O cleanup path on the server side.   
> In cases where a flow needed to cancel pending I/O operations, the  
> trove cancel function was calling aio_cancel() directly.  This  
> doesn't work correctly if the alt-aio implementation is used.
>
> pvfs2-root-squash-address.patch
> -------------------------------
> This fixes a bug in the root squash checking on the server side.   
> The routine that compares a client address against the root squash  
> list was using getsockname() rather than getpeername().  The former  
> retrieves the server's address rather than the client's.
>
> pvfs2-ls-rm.patch
> -----------------
> This is an interim fix for the concurrent "rm -rf" and "ls" problem  
> that was recently discussed on the mailing list.  It sounds like  
> the long term direction is to switch to using entry names as dirent  
> tokens, but this patch fixes the majority of cases in the mean time  
> without a protocol change.  The problem in the case I was seeing  
> was a cache conflict between the two clients (the ls was caching  
> tokens in the pcache that caused rm to get the wrong position).   
> The token is 64 bits wide, but only the first 32 bits are used (the  
> START and END values are near the top of the 32 bit range).  This  
> patch takes advantage of the extra top 32 bits on the server side  
> to set a unique identifier in the token for each "readdir session"  
> so that their cache entries do not collide.  The client is not  
> aware of this change because it treats the token as an opaque  
> value.  A readdir session begins when a client requests the START  
> position.
>
> pvfs2-client-buffer-logging.patch
> ---------------------------------
> I don't know if there is any interest in this, but this adds some  
> debugging to the buffers used in the kernel module.  On startup,  
> pvfs2-client will print the buffer pointers (whether debugging is  
> enabled or not).  There are also new debugging messages that will  
> show the first byte of each memory buffer passing through the  
> kernel if enabled.  These logging messages were added to help track  
> down what ended up being a server side problem (see pvfs2-aio- 
> cancel.patch), but we kept it in case it is useful in the future.
>
> pvfs2-concurrent-dirent-ops.patch
> ---------------------------------
> I don't know that this is useful for anyone, but we are posting it  
> just in case.  This patch will disable the request scheduler  
> optimization that allows concurrent rmdirent and crdirent  
> operations on a given directory.  When this optimization was  
> originally introduced, we found some problems with consistency if  
> two clients attempted to create and delete the same file name at  
> the same time (sorry, I can't find the mailing list posting on this  
> right now, but I remember discussing it). We don't know if this is  
> still a problem or not, but we have still been running with this  
> optimization disabled as a safety precaution.
>
> -Phil
> Index: pvfs2_src/src/io/trove/trove-dbpf/dbpf-dspace.c
> ===================================================================
> --- pvfs2_src/src/io/trove/trove-dbpf/dbpf-dspace.c	(revision 3997)
> +++ pvfs2_src/src/io/trove/trove-dbpf/dbpf-dspace.c	(revision 3998)
> @@ -1391,8 +1391,13 @@
>              if ((cur_op->op.type == BSTREAM_READ_LIST) ||
>                  (cur_op->op.type == BSTREAM_WRITE_LIST))
>              {
> +#if 0
>                  ret = aio_cancel(cur_op->op.u.b_rw_list.fd,
>                                   cur_op->op.u.b_rw_list.aiocb_array);
> +#endif
> +                ret = cur_op->op.u.b_rw_list.aio_ops->aio_cancel(
> +                    cur_op->op.u.b_rw_list.fd,
> +                    cur_op->op.u.b_rw_list.aiocb_array);
>                  gossip_debug(
>                      GOSSIP_TROVE_DEBUG, "aio_cancel returned %s\n",
>                      ((ret == AIO_CANCELED) ? "CANCELED" :
> diff -Naur pvfs2-with-aio/src/io/dev/pint-dev.c pvfs2/src/io/dev/ 
> pint-dev.c
> --- pvfs2-with-aio/src/io/dev/pint-dev.c	2007-07-20  
> 10:02:32.000000000 -0400
> +++ pvfs2/src/io/dev/pint-dev.c	2007-10-02 10:25:52.000000000 -0400
> @@ -165,6 +165,8 @@
>      uint64_t page_size = sysconf(_SC_PAGE_SIZE), total_size;
>      void *ptr = NULL;
>      int ioctl_cmd[2] = {PVFS_DEV_MAP, 0};
> +    int debug_on = 0;
> +    uint64_t debug_mask = 0;
>
>      for (i = 0; i < ndesc; i++)
>      {
> @@ -207,6 +209,12 @@
>          desc[i].size = params[i].dev_buffer_size;
>          desc[i].count = params[i].dev_buffer_count;
>
> +        gossip_get_debug_mask(&debug_on, &debug_mask);
> +        gossip_set_debug_mask(1, GOSSIP_DEV_DEBUG);
> +        gossip_debug(GOSSIP_DEV_DEBUG,
> +            "Mapping pointer %p for I/O.\n", ptr);
> +        gossip_set_debug_mask(debug_on, debug_mask);
> +
>          /* ioctl to ask driver to map pages if needed */
>          if (ioctl_cmd[i] != 0)
>          {
> diff -Naur pvfs2-with-aio/src/kernel/linux-2.6/pvfs2-bufmap.c pvfs2/ 
> src/kernel/linux-2.6/pvfs2-bufmap.c
> --- pvfs2-with-aio/src/kernel/linux-2.6/pvfs2-bufmap.c	2007-08-19  
> 14:20:27.000000000 -0400
> +++ pvfs2/src/kernel/linux-2.6/pvfs2-bufmap.c	2007-10-02  
> 10:25:52.000000000 -0400
> @@ -530,6 +530,8 @@
>      void __user *offset = from;
>      void *to_kaddr = NULL;
>      struct pvfs_bufmap_desc *to = &desc_array[buffer_index];
> +    char* tmp_printer = NULL;
> +    int tmp_int = 0;
>
>      gossip_debug(GOSSIP_BUFMAP_DEBUG, "pvfs_bufmap_copy_from_user:  
> from %p, index %d, "
>                  "size %zd\n", from, buffer_index, size);
> @@ -550,6 +552,13 @@
>
>          to_kaddr = pvfs2_kmap(to->page_array[index]);
>          ret = copy_from_user(to_kaddr, offset, cur_copy_size);
> +        if(!tmp_printer)
> +        {
> +            tmp_printer = (char*)(to_kaddr);
> +            tmp_int += tmp_printer[0];
> +            gossip_debug(GOSSIP_BUFMAP_DEBUG, "First character  
> (integer value) in pvfs_bufmap_copy_from_user: %d\n", tmp_int);
> +        }
> +
>          pvfs2_kunmap(to->page_array[index]);
>
>          if (ret)
> @@ -726,6 +735,8 @@
>      struct iovec *copied_iovec = NULL;
>      struct pvfs_bufmap_desc *to = &desc_array[buffer_index];
>      unsigned int seg, page_offset = 0;
> +    char* tmp_printer = NULL;
> +    int tmp_int = 0;
>
>      gossip_debug(GOSSIP_BUFMAP_DEBUG,  
> "pvfs_bufmap_copy_iovec_from_user: index %d, "
>                  "size %zd\n", buffer_index, size);
> @@ -799,6 +810,14 @@
>          }
>          to_kaddr = pvfs2_kmap(to->page_array[index]);
>          ret = copy_from_user(to_kaddr + page_offset, from_addr,  
> cur_copy_size);
> +        if(!tmp_printer)
> +        {
> +            tmp_printer = (char*)(to_kaddr+page_offset);
> +            tmp_int += tmp_printer[0];
> +            gossip_debug(GOSSIP_BUFMAP_DEBUG, "First character  
> (integer value) in pvfs_bufmap_copy_from_user: %d\n", tmp_int);
> +        }
> +
> +
>          pvfs2_kunmap(to->page_array[index]);
>  #if 0
>          gossip_debug(GOSSIP_BUFMAP_DEBUG,  
> "pvfs2_bufmap_copy_iovec_from_user: copying from user %p to kernel % 
> p %zd bytes (to_kddr: %p,page_offset: %d)\n",
> @@ -961,6 +980,8 @@
>      struct iovec *copied_iovec = NULL;
>      struct pvfs_bufmap_desc *from = &desc_array[buffer_index];
>      unsigned int seg, page_offset = 0;
> +    char* tmp_printer = NULL;
> +    int tmp_int = 0;
>
>      gossip_debug(GOSSIP_BUFMAP_DEBUG,  
> "pvfs_bufmap_copy_to_user_iovec: index %d, "
>                  "size %zd\n", buffer_index, size);
> @@ -1034,6 +1055,12 @@
>              inc_index = 1;
>          }
>          from_kaddr = pvfs2_kmap(from->page_array[index]);
> +        if(!tmp_printer)
> +        {
> +            tmp_printer = (char*)(from_kaddr + page_offset);
> +            tmp_int += tmp_printer[0];
> +            gossip_debug(GOSSIP_BUFMAP_DEBUG, "First character  
> (integer value) in pvfs_bufmap_copy_to_user_iovec: %d\n", tmp_int);
> +        }
>          ret = copy_to_user(to_addr, from_kaddr + page_offset,  
> cur_copy_size);
>          pvfs2_kunmap(from->page_array[index]);
>  #if 0
> Index: pvfs2_src/src/server/request-scheduler/request-scheduler.c
> ===================================================================
> --- pvfs2_src/src/server/request-scheduler/request-scheduler.c	 
> (revision 2537)
> +++ pvfs2_src/src/server/request-scheduler/request-scheduler.c	 
> (revision 2538)
> @@ -619,6 +619,7 @@
>                  ret = 0;
>              }
>          }
> +#if 0
>          else if((in_request->op == PVFS_SERV_CRDIRENT ||
>                  in_request->op == PVFS_SERV_RMDIRENT) &&
>                  next_element->state == REQ_SCHEDULED &&
> @@ -656,6 +657,7 @@
>                  ret = 0;
>              }
>          }
> +#endif
>  	else
>  	{
>  	    tmp_element->state = REQ_QUEUED;
> Index: pvfs2_src/src/io/trove/trove-dbpf/dbpf-keyval.c
> ===================================================================
> --- pvfs2_src/src/io/trove/trove-dbpf/dbpf-keyval.c	(revision 4195)
> +++ pvfs2_src/src/io/trove/trove-dbpf/dbpf-keyval.c	(revision 4196)
> @@ -38,6 +38,8 @@
>  #include "pvfs2-internal.h"
>  #include "pint-perf-counter.h"
>
> +static uint32_t readdir_session = 0;
> +
>  extern int synccount;
>
>  /**
> @@ -670,6 +672,7 @@
>  static int dbpf_keyval_iterate_op_svc(struct dbpf_op *op_p)
>  {
>      int count, ret;
> +    uint64_t tmp_pos = 0;
>
>      assert(*op_p->u.k_iterate.count_p > 0);
>
> @@ -713,6 +716,10 @@
>          if(*op_p->u.k_iterate.position_p == TROVE_ITERATE_START)
>          {
>              *op_p->u.k_iterate.position_p = count;
> +            /* store a session identifier in the top 32 bits */
> +            tmp_pos += readdir_session;
> +            *op_p->u.k_iterate.position_p += (tmp_pos << 32);
> +            readdir_session++;
>          }
>          else
>          {
> @@ -1454,7 +1461,10 @@
>           * we fall back to stepping through all the entries to get
>           * to the position
>           */
> -
> +        /* strip the session out of the position; we need to use a  
> true
> +         * integer offset if we get past the cache
> +         */
> +        pos = pos & 0xffff;
>          return dbpf_keyval_iterate_step_to_position(handle, pos,  
> dbc_p);
>      }
>
> Index: pvfs2_src/src/io/bmi/bmi_tcp/bmi-tcp.c
> ===================================================================
> --- pvfs2_src/src/io/bmi/bmi_tcp/bmi-tcp.c	(revision 4234)
> +++ pvfs2_src/src/io/bmi/bmi_tcp/bmi-tcp.c	(revision 4235)
> @@ -1708,9 +1708,15 @@
>      struct sockaddr_in map_addr;
>      socklen_t map_addr_len = sizeof(map_addr);
>      const char *tcp_wildcard = wildcard_string + 6 /* strlen 
> ("tcp://") */;
> +    int ret = -1;
>
>      memset(&map_addr, 0, sizeof(map_addr));
> -    getsockname(tcp_addr_data->socket, (struct sockaddr *)  
> &map_addr, &map_addr_len);
> +    if(getpeername(tcp_addr_data->socket, (struct sockaddr *)  
> &map_addr, &map_addr_len) < 0)
> +    {
> +        ret =  bmi_tcp_errno_to_pvfs(-EINVAL);
> +        gossip_err("Error: failed to retrieve peer name for client. 
> \n");
> +        return(ret);
> +    }
>      /* Wildcard specification */
>      if (netmask == -1)
>      {
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers



More information about the Pvfs2-developers mailing list