[Pvfs2-users] PVFS2 2.8.1 - batch_create request got: Invalid argument

Phil Carns carns at mcs.anl.gov
Mon Apr 6 15:31:42 EDT 2009


I'm running out of places to add log messages to in the code :)

I see a possible cause that I missed before, but we should be able to 
check this one without a patch.  Can you do a "diff" of the two 
configuration files and see if they are different in any way?  In 
particular do the ID values match?

thanks,
-Phil

Asterios Katsifodimos wrote:
> No, the systems are identical :)
> 
> [root at wn140 ~]# hostname
> wn140.grid.ucy.ac.cy <http://wn140.grid.ucy.ac.cy>
> [root at wn140 ~]# uname -a
> Linux wn140.grid.ucy.ac.cy <http://wn140.grid.ucy.ac.cy> 
> 2.6.9-78.0.13.ELsmp #1 SMP Wed Jan 14 19:07:47 CST 2009 i686 athlon i386 
> GNU/Linux
> [root at wn140 ~]# cat /etc/redhat-release
> Scientific Linux SL release 4.7 (Beryllium)
> [root at wn140 pvfs-2.8.1]# more /proc/cpuinfo
> processor       : 0
> vendor_id       : AuthenticAMD
> cpu family      : 15
> model           : 65
> model name      : Dual-Core AMD Opteron(tm) Processor 2214
> stepping        : 2
> cpu MHz         : 2200.000
> cache size      : 1024 KB
> physical id     : 0
> siblings        : 2
> core id         : 0
> cpu cores       : 2
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 1
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
> mca cmov
> pat pse36 clflush mmx fxsr sse sse2 ht pni syscall nx mmxext fxsr_opt 
> rdtscp l
> 
> 
> 
> [root at wn141 ~]# hostname
> wn141.grid.ucy.ac.cy <http://wn141.grid.ucy.ac.cy>
> [root at wn141 ~]# uname -a
> Linux wn141.grid.ucy.ac.cy <http://wn141.grid.ucy.ac.cy> 
> 2.6.9-78.0.13.ELsmp #1 SMP Wed Jan 14 19:07:47 CST 2009 i686 athlon i386 
> GNU/Linux
> [root at wn141 ~]# cat /etc/redhat-release
> Scientific Linux SL release 4.7 (Beryllium)
> [root at wn141 pvfs-2.8.1]# more /proc/cpuinfo
> processor       : 0
> vendor_id       : AuthenticAMD
> cpu family      : 15
> model           : 65
> model name      : Dual-Core AMD Opteron(tm) Processor 2214
> stepping        : 2
> cpu MHz         : 2200.000
> cache size      : 1024 KB
> physical id     : 0
> siblings        : 2
> core id         : 0
> cpu cores       : 2
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 1
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
> mca cmov
> pat pse36 clflush mmx fxsr sse sse2 ht pni syscall nx mmxext fxsr_opt 
> rdtscp lm
> 
> 
> Patch applied, logs updated!
> http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
> http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
> 
> thanks,
> Asterios Katsifodimos
> High Performance Computing systems Lab
> Department of Computer Science, University of Cyprus
> http://grid.ucy.ac.cy
> 
> 
> On Mon, Apr 6, 2009 at 10:03 PM, Phil Carns <carns at mcs.anl.gov 
> <mailto:carns at mcs.anl.gov>> wrote:
> 
>     That didn't show what I expected at all.  It must have hit a safety
>     check on the request parameters.  Could you try adding in the
>     attached patch as well?
> 
>     What kind of systems are these?  Are the two servers different
>     architectures by any chance?
> 
> 
>     thanks,
>     -Phil
> 
>     Asterios Katsifodimos wrote:
> 
>         Thanks!
>         I have applied the patch.
> 
>         I have replaced the old logs with the new ones. Just use the
>         previous links.
>         http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
>         http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
> 
>         thanks a lot for your help,
>         On Mon, Apr 6, 2009 at 8:41 PM, Phil Carns <carns at mcs.anl.gov
>         <mailto:carns at mcs.anl.gov> <mailto:carns at mcs.anl.gov
>         <mailto:carns at mcs.anl.gov>>> wrote:
> 
>            Thanks for posting the logs.  It looks like the create_list
>         function
>            in within Trove actually generated the EINVAL error, but there
>            aren't enough log messages in that path to know why.
> 
>            Any chance you could apply the patch attached to this email and
>            retry this scenario (with verbose logging)?  I'm hoping for some
>            extra output after the line that looks like this:
> 
>            (0x8d4f020) batch_create (prelude sm) state: perm_check
>         (status = 0)
> 
> 
>            thanks,
>            -Phil
> 
> 
>            Asterios Katsifodimos wrote:
> 
>                Yes both of them. Because now both are Metadata servers.
>         When I
>                had one metadata and
>                one IO server, the metadata server was not producing the
>         errors
>                until the IO server got up.
>                 From the time that the IO server gets up, the Metadata
>         server
>                is getting crazy...
> 
>                I have uploaded the log files here:
>                http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
>                http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
> 
>                have a look!
> 
>                thanks!
>                On Mon, Apr 6, 2009 at 7:00 PM, Phil Carns
>         <carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
>                <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>
>         <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
>                <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>>>
>         wrote:
> 
>                   Ok.  Could you try "verbose" now as the log level?  It is
>                close to
>                   the "all" level but should only print information
>         while the
>                server
>                   is busy.
> 
>                   Are both wn140 and wn141 showing the same batch create
>         errors, or
>                   just one of them?
> 
> 
>                   thanks,
>                   -Phil
> 
>                   Asterios Katsifodimos wrote:
> 
>                       Hello Phil,
> 
>                       Thanks for you answer.
>                       Yes I delete the storage dir every time I make a new
>                configuration
>                       and I run the pvfs2-server -f command before
>         starting the
>                daemons.
> 
>                       The only thing that I get from the servers is the
>                batch_create,
>                       starting server, and the "PVFS2 server got signal 15
>                       (server_status_flag: 507903"
>                       error message. Do you want me to try on an other
>         log level?
> 
>                       Also, this is how the server is configured:
>                       ***** Displaying PVFS Configuration Information *****
>                       ------------------------------------------------------
>                       PVFS2 configured to build karma gui              
>         :  no
>                       PVFS2 configured to perform coverage analysis    
>         :  no
>                       PVFS2 configured for aio threaded callbacks      
>         : yes
>                       PVFS2 configured to use FUSE                    
>          :  no
>                       PVFS2 configured for the 2.6.x kernel module    
>          :  no
>                       PVFS2 configured for the 2.4.x kernel module    
>          :  no
>                       PVFS2 configured for using the mmap-ra-cache    
>          :  no
>                       PVFS2 will use workaround for redhat 2.4 kernels
>          :  no
>                       PVFS2 will use workaround for buggy NPTL        
>          :  no
>                       PVFS2 server will be built                      
>          : yes
> 
>                       PVFS2 version string: 2.8.1
> 
> 
>                       thanks again,
>                       On Mon, Apr 6, 2009 at 5:21 PM, Phil Carns
>                <carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
>         <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>
>                       <mailto:carns at mcs.anl.gov
>         <mailto:carns at mcs.anl.gov> <mailto:carns at mcs.anl.gov
>         <mailto:carns at mcs.anl.gov>>>
>                <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
>         <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>
> 
>                       <mailto:carns at mcs.anl.gov
>         <mailto:carns at mcs.anl.gov> <mailto:carns at mcs.anl.gov
>         <mailto:carns at mcs.anl.gov>>>>>
> 
>                wrote:
> 
>                          Hello,
> 
>                          I'm not sure what would cause that "Invalid
>         argument"
>                error.
> 
>                          Could you try the following steps:
> 
>                          - kill both servers
>                          - modify your configuration files to set
>                "EventLogging" to "none"
>                          - delete your old log files (or move them to
>         another
>                directory)
>                          - start the servers
> 
>                          You can then send us the complete contents of
>         both log
>                files
>                       and we
>                          can go from there.  The "all" level is a little
>         hard
>                to interpret
>                          because it generates a lot of information even when
>                servers
>                       are idle.
> 
>                          Also, when you went from one server to two, did
>         you delete
>                       your old
>                          storage space (/pvfs) and start over, or are
>         you trying to
>                       keep that
>                          data and add servers to it?
> 
>                          thanks!
>                          -Phil
> 
>                          Asterios Katsifodimos wrote:
> 
>                              Hello all,
> 
>                              I have been trying to install PVFS 2.8.1 on
>         Ubuntu
>                server,
>                              Centos4 and
>                              Scientific Linux 4. I compile it and can
>         run it on
>                a "single
>                              host" configuration
>                              without any problems.
> 
>                              However, when I add more nodes to the
>                       configuration(always using the
>                              pvfs2-genconfig defaults ) I have the following
>                problem:
> 
>                              *On the metadata node I get these messages:*
>                              [E 04/02 20:16] batch_create request got:
>         Invalid
>                argument
>                              [E 04/02 20:16] batch_create request got:
>         Invalid
>                argument
>                              [E 04/02 20:16] batch_create request got:
>         Invalid
>                argument
>                              [E 04/02 20:16] batch_create request got:
>         Invalid
>                argument
> 
> 
>                              *In the IO nodes I get:*
>                              [root at wn140 ~]# tail -50 /tmp/pvfs2-server.log
>                              [D 04/02 23:53] BMI_testcontext completing:
>                       18446744072456767880
>                              [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>                              msgpairarray_sm:complete (status: 1)
>                              [D 04/02 23:53] [SM frame get]: (0x88f8b00)
>         op-id: 37
>                       index: 0
>                              base-frm: 1
>                              [D 04/02 23:53] msgpairarray_complete: sm
>         0x88f8b00
>                              status_user_tag 1 msgarray_count 1
>                              [D 04/02 23:53]   msgpairarray: 1
>         operations remain
>                              [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>                              msgpairarray_sm:complete (error code:
>                -1073742006), (action:
>                              DEFERRED)
>                              [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>                              msgpairarray_sm:complete (status: 0)
>                              [D 04/02 23:53] [SM frame get]: (0x88f8b00)
>         op-id: 37
>                       index: 0
>                              base-frm: 1
>                              [D 04/02 23:53] msgpairarray_complete: sm
>         0x88f8b00
>                              status_user_tag 0 msgarray_count 1
>                              [D 04/02 23:53]   msgpairarray: all operations
>                complete
>                              [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>                              msgpairarray_sm:complete (error code: 190),
>         (action:
>                       COMPLETE)
>                              [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>                              msgpairarray_sm:completion_fn (status: 0)
>                              [D 04/02 23:53] [SM frame get]: (0x88f8b00)
>         op-id: 37
>                       index: 0
>                              base-frm: 1
>                              [D 04/02 23:53] (0x88f8b00) msgpairarray state:
>                completion_fn
>                              [E 04/02 23:53] Warning: msgpair failed to
>                tcp://wn141:3334,
>                              will retry: Connection refused
>                              [D 04/02 23:53] *** msgpairarray_completion_fn:
>                msgpair 0
>                              failed, retry 1
>                              [D 04/02 23:53] *** msgpairarray_completion_fn:
>                msgpair
>                       retrying
>                              after delay.
>                              [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>                              msgpairarray_sm:completion_fn (error code:
>         191),
>                (action:
>                       COMPLETE)
>                              [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>                              msgpairarray_sm:post_retry (status: 0)
>                              [D 04/02 23:53] [SM frame get]: (0x88f8b00)
>         op-id: 37
>                       index: 0
>                              base-frm: 1
>                              [D 04/02 23:53] msgpairarray_post_retry: sm
>         0x88f8b00,
>                       wait 2000 ms
>                              [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>                              msgpairarray_sm:post_retry (error code: 0),
>         (action:
>                       DEFERRED)
>                              [D 04/02 23:53] [SM Entering]: (0x89476c0)
>                              perf_update_sm:do_work (status: 0)
>                              [P 04/02 23:53] Start times (hr:min:sec):
>                 23:53:11.330
>                               23:53:10.310  23:53:09.287  23:53:08.268
>                 23:53:07.245
>                               23:53:06.225
>                              [P 04/02 23:53] Intervals (hr:min:sec)  :
>                 00:00:01.026
>                               00:00:01.020  00:00:01.023  00:00:01.019
>                 00:00:01.023
>                               00:00:01.020
>                              [P 04/02 23:53]
>                                        
>          -------------------------------------------------------------------------------------------------------------
>                              [P 04/02 23:53] bytes read              :  
>                          0                           0             0    
>                 0                    0                    0
>                              [P 04/02 23:53] bytes written           :  
>                          0                           0             0    
>                 0                    0                    0
>                              [P 04/02 23:53] metadata reads          :  
>                          0                           0             0    
>                 0                    0                    0
>                              [P 04/02 23:53] metadata writes         :  
>                          0                           0             0    
>                 0                    0                    0
>                              [P 04/02 23:53] metadata dspace ops     :  
>                          0                           0             0    
>                 0                    0                    0
>                              [P 04/02 23:53] metadata keyval ops     :  
>                          1                           1             1    
>                 1                    1                    1
>                              [P 04/02 23:53] request scheduler       :  
>                          0                           0             0    
>                 0                    0                    0
>                              [D 04/02 23:53] [SM Exiting]: (0x89476c0)
>                       perf_update_sm:do_work
>                              (error code: 0), (action: DEFERRED)
>                              [D 04/02 23:53] [SM Entering]: (0x8948810)
>                       job_timer_sm:do_work
>                              (status: 0)
>                              [D 04/02 23:53] [SM Exiting]: (0x8948810)
>                       job_timer_sm:do_work
>                              (error code: 0), (action: DEFERRED)
>                              [D 04/02 23:53] [SM Entering]: (0x89476c0)
>                              perf_update_sm:do_work (status: 0)
>                              [P 04/02 23:53] Start times (hr:min:sec):
>                 23:53:12.356
>                               23:53:11.330  23:53:10.310  23:53:09.287
>                 23:53:08.268
>                               23:53:07.245
>                              [P 04/02 23:53] Intervals (hr:min:sec)  :
>                 00:00:01.020
>                               00:00:01.026  00:00:01.020  00:00:01.023
>                 00:00:01.019
>                               00:00:01.023
>                              [P 04/02 23:53]
>                                        
>          -------------------------------------------------------------------------------------------------------------
>                              [P 04/02 23:53] bytes read              :  
>                          0                           0             0    
>                 0                    0                    0
>                              [P 04/02 23:53] bytes written           :  
>                          0                           0             0    
>                 0                    0                    0
>                              [P 04/02 23:53] metadata reads          :  
>                          0                           0             0    
>                 0                    0                    0
>                              [P 04/02 23:53] metadata writes         :  
>                          0                           0             0    
>                 0                    0                    0
>                              [P 04/02 23:53] metadata dspace ops     :  
>                          0                           0             0    
>                 0                    0                    0
>                              [P 04/02 23:53] metadata keyval ops     :  
>                          1                           1             1    
>                 1                    1                    1
>                              [P 04/02 23:53] request scheduler       :  
>                          0                           0             0    
>                 0                    0                    0
>                              [D 04/02 23:53] [SM Exiting]: (0x89476c0)
>                       perf_update_sm:do_work
>                              (error code: 0), (action: DEFERRED)
>                              [D 04/02 23:53] [SM Entering]: (0x8948810)
>                       job_timer_sm:do_work
>                              (status: 0)
>                              [D 04/02 23:53] [SM Exiting]: (0x8948810)
>                       job_timer_sm:do_work
>                              (error code: 0), (action: DEFERRED)
> 
> 
>                              The metadata node keeps asking for
>         something that
>                the IO
>                       nodes
>                              cannot give
>                              the right way. So it complains. This makes the
>                nodes and the
>                              metadata node
>                              not to work.
> 
>                              I have installed those services many times.
>         I have
>                tested
>                       this
>                              using berkeley
>                              db 4.2 and 4.3 on Redhat systems(centos,
>         scientific
>                       linnux) and
>                              on Ubuntu server.
> 
>                              I have also tried the PVFS version 2.6.3
>         and I get the
>                       same problem.
> 
>                              *My config files look like:*
>                              [root at wn140 ~]# more /etc/pvfs2-fs.conf
>                              <Defaults>
>                                 UnexpectedRequests 50
>                                 EventLogging all
>                                 EnableTracing no
>                                 LogStamp datetime
>                                 BMIModules bmi_tcp
>                                 FlowModules flowproto_multiqueue
>                                 PerfUpdateInterval 1000
>                                 ServerJobBMITimeoutSecs 30
>                                 ServerJobFlowTimeoutSecs 30
>                                 ClientJobBMITimeoutSecs 300
>                                 ClientJobFlowTimeoutSecs 300
>                                 ClientRetryLimit 5
>                                 ClientRetryDelayMilliSecs 2000
>                                 PrecreateBatchSize 512
>                                 PrecreateLowThreshold 256
> 
>                                 StorageSpace /pvfs
>                                 LogFile /tmp/pvfs2-server.log
>                              </Defaults>
> 
>                              <Aliases>
>                                 Alias wn140 tcp://wn140:3334
>                                 Alias wn141 tcp://wn141:3334
>                              </Aliases>
> 
>                              <Filesystem>
>                                 Name pvfs2-fs
>                                 ID 320870944
>                                 RootHandle 1048576
>                                 FileStuffing yes
>                                 <MetaHandleRanges>
>                                     Range wn140 3-2305843009213693953
>                                     Range wn141
>                2305843009213693954-4611686018427387904
>                                 </MetaHandleRanges>
>                                 <DataHandleRanges>
>                                     Range wn140
>                4611686018427387905-6917529027641081855
>                                     Range wn141
>                6917529027641081856-9223372036854775806
>                                 </DataHandleRanges>
>                                 <StorageHints>
>                                     TroveSyncMeta yes
>                                     TroveSyncData no
>                                     TroveMethod alt-aio
>                                 </StorageHints>
>                              </Filesystem>
> 
> 
>                              My setup is made from two nodes that are
>         both IO
>                and Metadata
>                              nodes. I have also tried
>                              a 4 node setup with 2I/O - 2 MD nodes
>         resulting in the
>                       same thing.
> 
>                              Any suggestions?
> 
>                              thank you in advance,
>                              --
>                              Asterios Katsifodimos
>                              High Performance Computing systems Lab
>                              Department of Computer Science, University
>         of Cyprus
>                              http://www.asteriosk.gr
>         <http://www.asteriosk.gr/>
> 
> 
>                                        
>          ------------------------------------------------------------------------
> 
>                              _______________________________________________
>                              Pvfs2-users mailing list
>                              Pvfs2-users at beowulf-underground.org
>         <mailto:Pvfs2-users at beowulf-underground.org>
>                <mailto:Pvfs2-users at beowulf-underground.org
>         <mailto:Pvfs2-users at beowulf-underground.org>>
>                       <mailto:Pvfs2-users at beowulf-underground.org
>         <mailto:Pvfs2-users at beowulf-underground.org>
>                <mailto:Pvfs2-users at beowulf-underground.org
>         <mailto:Pvfs2-users at beowulf-underground.org>>>
>                              <mailto:Pvfs2-users at beowulf-underground.org
>         <mailto:Pvfs2-users at beowulf-underground.org>
>                <mailto:Pvfs2-users at beowulf-underground.org
>         <mailto:Pvfs2-users at beowulf-underground.org>>
>                       <mailto:Pvfs2-users at beowulf-underground.org
>         <mailto:Pvfs2-users at beowulf-underground.org>
>                <mailto:Pvfs2-users at beowulf-underground.org
>         <mailto:Pvfs2-users at beowulf-underground.org>>>>
> 
>                                        
>          http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
> 
> 
> 
> 
> 
> 
> 
> 
> 



More information about the Pvfs2-users mailing list