[PVFS2-developers] Re: smallio server death with old kernel

Murali Vilayannur vilayann at mcs.anl.gov
Wed Jan 4 21:26:11 EST 2006


Pete,
Could you try the attached patch and let me know if it fixes the crash?
(There are still some other bugs like the size of the file reported is
off-by-one, but I haven't dug deep into the SMALL_IO code just yet..)

Also, could you look over and see if the encode_skip4 that I added will
fix those pesky alignment warnings on 64 bit machines or if they are
unnecessary?

Sam wrote the SMALL_IO protocol and he knows it best. I am pretty sure
this is an ugly fix if at all it works :)
I don't know how/if Rob/Sam want SMALL_IO protocol to be disabled since
we need people to use the CVS head version and find the last remaining
bugs (hopefully!). We could make an environment variable/mount-time option
that would disable small i/o changes temporarily but I think that decision is upto
them..

Thanks,
Murali

On Wed, 4 Jan 2006, Pete Wyckoff wrote:

> pw at osc.edu wrote on Fri, 23 Dec 2005 17:29 -0500:
> > Just a heads up:  I can make this server error happen consistently
> > with today's pvfs2 cvs:
> >
> > [E 16:19:23.007018] ../pvfs2/src/proto/PINT-le-bytefield.c line 572: lebf_decode_req: improper input buffer size
> > [E 16:19:23.007244]     [bt] pvfs2-server [0x808b60a]
> > [E 16:19:23.007258]     [bt] pvfs2-server(PINT_decode+0x230)
> > [0x8089b98]
> > [E 16:19:23.007271]     [bt] pvfs2-server(vfprintf+0x23c8)
> > [0x805d7a8]
> > [E 16:19:23.007283]     [bt] pvfs2-server(main+0x3b5) [0x805be95]
> > [E 16:19:23.007295]     [bt]
> > /lib/i686/libc.so.6(__libc_start_main+0xaa) [0x400a5bba]
> > [E 16:19:23.007307]     [bt] pvfs2-server(shmat+0x41) [0x805b971]
> [..]
> > The two server logs I get with debugmask verbose are attached if
> > they help.  Walking up from a breakpoint on that error line 572
> > shows that we're in a smallio req:
> >
> >     (gdb) p req->op
> >     $8 = PVFS_SERV_SMALL_IO
>
> Did anybody get a chance to look at this?  I worry that it is a side
> effect of the new SMALL_IO protocol addition.  Any way I can disable
> that and see if the problem still occurs?
>
> 		-- Pete
> _______________________________________________
> PVFS2-developers mailing list
> PVFS2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
>
-------------- next part --------------
Index: src/proto/pvfs2-req-proto.h
===================================================================
RCS file: /anoncvs/pvfs2/src/proto/pvfs2-req-proto.h,v
retrieving revision 1.132
diff -u -r1.132 pvfs2-req-proto.h
--- src/proto/pvfs2-req-proto.h	14 Dec 2005 21:50:30 -0000	1.132
+++ src/proto/pvfs2-req-proto.h	5 Jan 2006 03:18:52 -0000
@@ -17,6 +17,7 @@
 #include "pvfs2-request.h"
 #include "pint-request.h"
 #include "pvfs2-mgmt.h"
+#include "gossip.h"
 
 /* update PVFS2_PROTO_MAJOR on wire protocol changes that break backwards
  * compatibility (such as changing the semantics or protocol fields for an
@@ -945,11 +946,11 @@
     struct PINT_Request * file_req;
     PVFS_offset file_req_offset;
     PVFS_size aggregate_size;
-    int segments;
 
     /* these are used for writes to map the regions of the memory buffer
      * to the contiguous encoded message.  They don't get encoded.
      */
+    int segments;
     PVFS_offset offsets[SMALL_IO_MAX_SEGMENTS];
     PVFS_size sizes[SMALL_IO_MAX_SEGMENTS];
 
@@ -958,36 +959,51 @@
 
 #ifdef __PINT_REQPROTO_ENCODE_FUNCS_C
 #define encode_PVFS_servreq_small_io(pptr,x) do { \
+    void *oldptr = (*pptr);\
     encode_PVFS_handle(pptr, &(x)->handle); \
     encode_PVFS_fs_id(pptr, &(x)->fs_id); \
     encode_skip4(pptr,); \
     encode_enum(pptr, &(x)->io_type); \
     encode_uint32_t(pptr, &(x)->server_nr); \
     encode_uint32_t(pptr, &(x)->server_ct); \
+    encode_skip4(pptr,); \
     encode_PINT_dist(pptr, &(x)->dist); \
     encode_PINT_Request(pptr, &(x)->file_req); \
     encode_PVFS_offset(pptr, &(x)->file_req_offset); \
-    encode_PVFS_size(pptr, &(x)->aggregate_size); \
     if ((x)->io_type == PVFS_IO_WRITE) \
     { \
         int i = 0; \
-        for(; i < (x)->segments; ++i) \
+        PVFS_size _aggregate_size = 0;\
+        for (i = 0; i < (x)->segments; ++i) \
         { \
+            _aggregate_size += (x)->sizes[i];\
+        } \
+        (x)->aggregate_size = _aggregate_size;\
+        encode_PVFS_size(pptr, &(x)->aggregate_size); \
+        for (i = 0; i < (x)->segments; ++i) \
+        { \
+            gossip_debug(GOSSIP_ENDECODE_DEBUG, "offsets = %llu, sizes = %llu\n", llu((x)->offsets[i]), llu((x)->sizes[i]));\
             memcpy((*pptr), \
                    (char *)(x)->buffer + ((x)->offsets[i]), \
                    (x)->sizes[i]); \
             (*pptr) += (x)->sizes[i]; \
         } \
     } \
+    else {\
+        encode_PVFS_size(pptr, &(x)->aggregate_size); \
+    }\
+    gossip_debug(GOSSIP_ENDECODE_DEBUG, "%p -> %p %lu\n", oldptr, (*pptr), (unsigned long)(*pptr)- (unsigned long)oldptr);\
 } while (0)
 
 #define decode_PVFS_servreq_small_io(pptr,x) do { \
+    void *oldptr = (*pptr);\
     decode_PVFS_handle(pptr, &(x)->handle); \
     decode_PVFS_fs_id(pptr, &(x)->fs_id); \
     decode_skip4(pptr,); \
     decode_enum(pptr, &(x)->io_type); \
     decode_uint32_t(pptr, &(x)->server_nr); \
     decode_uint32_t(pptr, &(x)->server_ct); \
+    decode_skip4(pptr,); \
     decode_PINT_dist(pptr, &(x)->dist); \
     decode_PINT_Request(pptr, &(x)->file_req); \
     PINT_request_decode((x)->file_req); /* unpacks the pointers */ \
@@ -1002,6 +1018,7 @@
         (x)->buffer = (*pptr); \
         (*pptr) += (x)->aggregate_size; \
     } \
+    gossip_debug(GOSSIP_ENDECODE_DEBUG, "%p -> %p %lu\n", oldptr, (*pptr), (unsigned long)(*pptr) - (unsigned long)oldptr);\
 } while (0)
 #endif
 


More information about the PVFS2-developers mailing list