[Pvfs2-users] PVFS2 2.8.1 - batch_create request got: Invalid argument

Asterios Katsifodimos asteriosk at gmail.com
Mon Apr 6 15:17:17 EDT 2009


No, the systems are identical :)

[root at wn140 ~]# hostname
wn140.grid.ucy.ac.cy
[root at wn140 ~]# uname -a
Linux wn140.grid.ucy.ac.cy 2.6.9-78.0.13.ELsmp #1 SMP Wed Jan 14 19:07:47
CST 2009 i686 athlon i386 GNU/Linux
[root at wn140 ~]# cat /etc/redhat-release
Scientific Linux SL release 4.7 (Beryllium)
[root at wn140 pvfs-2.8.1]# more /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 2214
stepping        : 2
cpu MHz         : 2200.000
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov
pat pse36 clflush mmx fxsr sse sse2 ht pni syscall nx mmxext fxsr_opt rdtscp
l



[root at wn141 ~]# hostname
wn141.grid.ucy.ac.cy
[root at wn141 ~]# uname -a
Linux wn141.grid.ucy.ac.cy 2.6.9-78.0.13.ELsmp #1 SMP Wed Jan 14 19:07:47
CST 2009 i686 athlon i386 GNU/Linux
[root at wn141 ~]# cat /etc/redhat-release
Scientific Linux SL release 4.7 (Beryllium)
[root at wn141 pvfs-2.8.1]# more /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 2214
stepping        : 2
cpu MHz         : 2200.000
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov
pat pse36 clflush mmx fxsr sse sse2 ht pni syscall nx mmxext fxsr_opt rdtscp
lm


Patch applied, logs updated!
http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy

thanks,
Asterios Katsifodimos
High Performance Computing systems Lab
Department of Computer Science, University of Cyprus
http://grid.ucy.ac.cy


On Mon, Apr 6, 2009 at 10:03 PM, Phil Carns <carns at mcs.anl.gov> wrote:

> That didn't show what I expected at all.  It must have hit a safety check
> on the request parameters.  Could you try adding in the attached patch as
> well?
>
> What kind of systems are these?  Are the two servers different
> architectures by any chance?
>
> thanks,
> -Phil
>
> Asterios Katsifodimos wrote:
>
>> Thanks!
>> I have applied the patch.
>>
>> I have replaced the old logs with the new ones. Just use the previous
>> links.
>> http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
>> http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
>>
>> thanks a lot for your help,
>> On Mon, Apr 6, 2009 at 8:41 PM, Phil Carns <carns at mcs.anl.gov <mailto:
>> carns at mcs.anl.gov>> wrote:
>>
>>    Thanks for posting the logs.  It looks like the create_list function
>>    in within Trove actually generated the EINVAL error, but there
>>    aren't enough log messages in that path to know why.
>>
>>    Any chance you could apply the patch attached to this email and
>>    retry this scenario (with verbose logging)?  I'm hoping for some
>>    extra output after the line that looks like this:
>>
>>    (0x8d4f020) batch_create (prelude sm) state: perm_check (status = 0)
>>
>>
>>    thanks,
>>    -Phil
>>
>>
>>    Asterios Katsifodimos wrote:
>>
>>        Yes both of them. Because now both are Metadata servers. When I
>>        had one metadata and
>>        one IO server, the metadata server was not producing the errors
>>        until the IO server got up.
>>         From the time that the IO server gets up, the Metadata server
>>        is getting crazy...
>>
>>        I have uploaded the log files here:
>>        http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
>>        http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
>>
>>        have a look!
>>
>>        thanks!
>>        On Mon, Apr 6, 2009 at 7:00 PM, Phil Carns <carns at mcs.anl.gov
>>        <mailto:carns at mcs.anl.gov> <mailto:carns at mcs.anl.gov
>>        <mailto:carns at mcs.anl.gov>>> wrote:
>>
>>           Ok.  Could you try "verbose" now as the log level?  It is
>>        close to
>>           the "all" level but should only print information while the
>>        server
>>           is busy.
>>
>>           Are both wn140 and wn141 showing the same batch create errors,
>> or
>>           just one of them?
>>
>>
>>           thanks,
>>           -Phil
>>
>>           Asterios Katsifodimos wrote:
>>
>>               Hello Phil,
>>
>>               Thanks for you answer.
>>               Yes I delete the storage dir every time I make a new
>>        configuration
>>               and I run the pvfs2-server -f command before starting the
>>        daemons.
>>
>>               The only thing that I get from the servers is the
>>        batch_create,
>>               starting server, and the "PVFS2 server got signal 15
>>               (server_status_flag: 507903"
>>               error message. Do you want me to try on an other log level?
>>
>>               Also, this is how the server is configured:
>>               ***** Displaying PVFS Configuration Information *****
>>               ------------------------------------------------------
>>               PVFS2 configured to build karma gui               :  no
>>               PVFS2 configured to perform coverage analysis     :  no
>>               PVFS2 configured for aio threaded callbacks       : yes
>>               PVFS2 configured to use FUSE                      :  no
>>               PVFS2 configured for the 2.6.x kernel module      :  no
>>               PVFS2 configured for the 2.4.x kernel module      :  no
>>               PVFS2 configured for using the mmap-ra-cache      :  no
>>               PVFS2 will use workaround for redhat 2.4 kernels  :  no
>>               PVFS2 will use workaround for buggy NPTL          :  no
>>               PVFS2 server will be built                        : yes
>>
>>               PVFS2 version string: 2.8.1
>>
>>
>>               thanks again,
>>               On Mon, Apr 6, 2009 at 5:21 PM, Phil Carns
>>        <carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
>>               <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>
>>        <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
>>
>>               <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>>>
>>
>>        wrote:
>>
>>                  Hello,
>>
>>                  I'm not sure what would cause that "Invalid argument"
>>        error.
>>
>>                  Could you try the following steps:
>>
>>                  - kill both servers
>>                  - modify your configuration files to set
>>        "EventLogging" to "none"
>>                  - delete your old log files (or move them to another
>>        directory)
>>                  - start the servers
>>
>>                  You can then send us the complete contents of both log
>>        files
>>               and we
>>                  can go from there.  The "all" level is a little hard
>>        to interpret
>>                  because it generates a lot of information even when
>>        servers
>>               are idle.
>>
>>                  Also, when you went from one server to two, did you
>> delete
>>               your old
>>                  storage space (/pvfs) and start over, or are you trying
>> to
>>               keep that
>>                  data and add servers to it?
>>
>>                  thanks!
>>                  -Phil
>>
>>                  Asterios Katsifodimos wrote:
>>
>>                      Hello all,
>>
>>                      I have been trying to install PVFS 2.8.1 on Ubuntu
>>        server,
>>                      Centos4 and
>>                      Scientific Linux 4. I compile it and can run it on
>>        a "single
>>                      host" configuration
>>                      without any problems.
>>
>>                      However, when I add more nodes to the
>>               configuration(always using the
>>                      pvfs2-genconfig defaults ) I have the following
>>        problem:
>>
>>                      *On the metadata node I get these messages:*
>>                      [E 04/02 20:16] batch_create request got: Invalid
>>        argument
>>                      [E 04/02 20:16] batch_create request got: Invalid
>>        argument
>>                      [E 04/02 20:16] batch_create request got: Invalid
>>        argument
>>                      [E 04/02 20:16] batch_create request got: Invalid
>>        argument
>>
>>
>>                      *In the IO nodes I get:*
>>                      [root at wn140 ~]# tail -50 /tmp/pvfs2-server.log
>>                      [D 04/02 23:53] BMI_testcontext completing:
>>               18446744072456767880
>>                      [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>>                      msgpairarray_sm:complete (status: 1)
>>                      [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37
>>               index: 0
>>                      base-frm: 1
>>                      [D 04/02 23:53] msgpairarray_complete: sm 0x88f8b00
>>                      status_user_tag 1 msgarray_count 1
>>                      [D 04/02 23:53]   msgpairarray: 1 operations remain
>>                      [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>>                      msgpairarray_sm:complete (error code:
>>        -1073742006), (action:
>>                      DEFERRED)
>>                      [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>>                      msgpairarray_sm:complete (status: 0)
>>                      [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37
>>               index: 0
>>                      base-frm: 1
>>                      [D 04/02 23:53] msgpairarray_complete: sm 0x88f8b00
>>                      status_user_tag 0 msgarray_count 1
>>                      [D 04/02 23:53]   msgpairarray: all operations
>>        complete
>>                      [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>>                      msgpairarray_sm:complete (error code: 190), (action:
>>               COMPLETE)
>>                      [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>>                      msgpairarray_sm:completion_fn (status: 0)
>>                      [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37
>>               index: 0
>>                      base-frm: 1
>>                      [D 04/02 23:53] (0x88f8b00) msgpairarray state:
>>        completion_fn
>>                      [E 04/02 23:53] Warning: msgpair failed to
>>        tcp://wn141:3334,
>>                      will retry: Connection refused
>>                      [D 04/02 23:53] *** msgpairarray_completion_fn:
>>        msgpair 0
>>                      failed, retry 1
>>                      [D 04/02 23:53] *** msgpairarray_completion_fn:
>>        msgpair
>>               retrying
>>                      after delay.
>>                      [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>>                      msgpairarray_sm:completion_fn (error code: 191),
>>        (action:
>>               COMPLETE)
>>                      [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>>                      msgpairarray_sm:post_retry (status: 0)
>>                      [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37
>>               index: 0
>>                      base-frm: 1
>>                      [D 04/02 23:53] msgpairarray_post_retry: sm
>> 0x88f8b00,
>>               wait 2000 ms
>>                      [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>>                      msgpairarray_sm:post_retry (error code: 0), (action:
>>               DEFERRED)
>>                      [D 04/02 23:53] [SM Entering]: (0x89476c0)
>>                      perf_update_sm:do_work (status: 0)
>>                      [P 04/02 23:53] Start times (hr:min:sec):
>>         23:53:11.330
>>                       23:53:10.310  23:53:09.287  23:53:08.268
>>         23:53:07.245
>>                       23:53:06.225
>>                      [P 04/02 23:53] Intervals (hr:min:sec)  :
>>         00:00:01.026
>>                       00:00:01.020  00:00:01.023  00:00:01.019
>>         00:00:01.023
>>                       00:00:01.020
>>                      [P 04/02 23:53]
>>
>>  -------------------------------------------------------------------------------------------------------------
>>                      [P 04/02 23:53] bytes read              :
>>        0                           0             0             0
>>        0                    0
>>                      [P 04/02 23:53] bytes written           :
>>        0                           0             0             0
>>        0                    0
>>                      [P 04/02 23:53] metadata reads          :
>>        0                           0             0             0
>>        0                    0
>>                      [P 04/02 23:53] metadata writes         :
>>        0                           0             0             0
>>        0                    0
>>                      [P 04/02 23:53] metadata dspace ops     :
>>        0                           0             0             0
>>        0                    0
>>                      [P 04/02 23:53] metadata keyval ops     :
>>        1                           1             1             1
>>        1                    1
>>                      [P 04/02 23:53] request scheduler       :
>>        0                           0             0             0
>>        0                    0
>>                      [D 04/02 23:53] [SM Exiting]: (0x89476c0)
>>               perf_update_sm:do_work
>>                      (error code: 0), (action: DEFERRED)
>>                      [D 04/02 23:53] [SM Entering]: (0x8948810)
>>               job_timer_sm:do_work
>>                      (status: 0)
>>                      [D 04/02 23:53] [SM Exiting]: (0x8948810)
>>               job_timer_sm:do_work
>>                      (error code: 0), (action: DEFERRED)
>>                      [D 04/02 23:53] [SM Entering]: (0x89476c0)
>>                      perf_update_sm:do_work (status: 0)
>>                      [P 04/02 23:53] Start times (hr:min:sec):
>>         23:53:12.356
>>                       23:53:11.330  23:53:10.310  23:53:09.287
>>         23:53:08.268
>>                       23:53:07.245
>>                      [P 04/02 23:53] Intervals (hr:min:sec)  :
>>         00:00:01.020
>>                       00:00:01.026  00:00:01.020  00:00:01.023
>>         00:00:01.019
>>                       00:00:01.023
>>                      [P 04/02 23:53]
>>
>>  -------------------------------------------------------------------------------------------------------------
>>                      [P 04/02 23:53] bytes read              :
>>        0                           0             0             0
>>        0                    0
>>                      [P 04/02 23:53] bytes written           :
>>        0                           0             0             0
>>        0                    0
>>                      [P 04/02 23:53] metadata reads          :
>>        0                           0             0             0
>>        0                    0
>>                      [P 04/02 23:53] metadata writes         :
>>        0                           0             0             0
>>        0                    0
>>                      [P 04/02 23:53] metadata dspace ops     :
>>        0                           0             0             0
>>        0                    0
>>                      [P 04/02 23:53] metadata keyval ops     :
>>        1                           1             1             1
>>        1                    1
>>                      [P 04/02 23:53] request scheduler       :
>>        0                           0             0             0
>>        0                    0
>>                      [D 04/02 23:53] [SM Exiting]: (0x89476c0)
>>               perf_update_sm:do_work
>>                      (error code: 0), (action: DEFERRED)
>>                      [D 04/02 23:53] [SM Entering]: (0x8948810)
>>               job_timer_sm:do_work
>>                      (status: 0)
>>                      [D 04/02 23:53] [SM Exiting]: (0x8948810)
>>               job_timer_sm:do_work
>>                      (error code: 0), (action: DEFERRED)
>>
>>
>>                      The metadata node keeps asking for something that
>>        the IO
>>               nodes
>>                      cannot give
>>                      the right way. So it complains. This makes the
>>        nodes and the
>>                      metadata node
>>                      not to work.
>>
>>                      I have installed those services many times. I have
>>        tested
>>               this
>>                      using berkeley
>>                      db 4.2 and 4.3 on Redhat systems(centos, scientific
>>               linnux) and
>>                      on Ubuntu server.
>>
>>                      I have also tried the PVFS version 2.6.3 and I get
>> the
>>               same problem.
>>
>>                      *My config files look like:*
>>                      [root at wn140 ~]# more /etc/pvfs2-fs.conf
>>                      <Defaults>
>>                         UnexpectedRequests 50
>>                         EventLogging all
>>                         EnableTracing no
>>                         LogStamp datetime
>>                         BMIModules bmi_tcp
>>                         FlowModules flowproto_multiqueue
>>                         PerfUpdateInterval 1000
>>                         ServerJobBMITimeoutSecs 30
>>                         ServerJobFlowTimeoutSecs 30
>>                         ClientJobBMITimeoutSecs 300
>>                         ClientJobFlowTimeoutSecs 300
>>                         ClientRetryLimit 5
>>                         ClientRetryDelayMilliSecs 2000
>>                         PrecreateBatchSize 512
>>                         PrecreateLowThreshold 256
>>
>>                         StorageSpace /pvfs
>>                         LogFile /tmp/pvfs2-server.log
>>                      </Defaults>
>>
>>                      <Aliases>
>>                         Alias wn140 tcp://wn140:3334
>>                         Alias wn141 tcp://wn141:3334
>>                      </Aliases>
>>
>>                      <Filesystem>
>>                         Name pvfs2-fs
>>                         ID 320870944
>>                         RootHandle 1048576
>>                         FileStuffing yes
>>                         <MetaHandleRanges>
>>                             Range wn140 3-2305843009213693953
>>                             Range wn141
>>        2305843009213693954-4611686018427387904
>>                         </MetaHandleRanges>
>>                         <DataHandleRanges>
>>                             Range wn140
>>        4611686018427387905-6917529027641081855
>>                             Range wn141
>>        6917529027641081856-9223372036854775806
>>                         </DataHandleRanges>
>>                         <StorageHints>
>>                             TroveSyncMeta yes
>>                             TroveSyncData no
>>                             TroveMethod alt-aio
>>                         </StorageHints>
>>                      </Filesystem>
>>
>>
>>                      My setup is made from two nodes that are both IO
>>        and Metadata
>>                      nodes. I have also tried
>>                      a 4 node setup with 2I/O - 2 MD nodes resulting in
>> the
>>               same thing.
>>
>>                      Any suggestions?
>>
>>                      thank you in advance,
>>                      --
>>                      Asterios Katsifodimos
>>                      High Performance Computing systems Lab
>>                      Department of Computer Science, University of Cyprus
>>                      http://www.asteriosk.gr <http://www.asteriosk.gr/>
>>
>>
>>
>>  ------------------------------------------------------------------------
>>
>>                      _______________________________________________
>>                      Pvfs2-users mailing list
>>                      Pvfs2-users at beowulf-underground.org
>>        <mailto:Pvfs2-users at beowulf-underground.org>
>>               <mailto:Pvfs2-users at beowulf-underground.org
>>        <mailto:Pvfs2-users at beowulf-underground.org>>
>>                      <mailto:Pvfs2-users at beowulf-underground.org
>>        <mailto:Pvfs2-users at beowulf-underground.org>
>>               <mailto:Pvfs2-users at beowulf-underground.org
>>        <mailto:Pvfs2-users at beowulf-underground.org>>>
>>
>>
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>>
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.beowulf-underground.org/pipermail/pvfs2-users/attachments/20090406/3a2be28f/attachment-0001.htm


More information about the Pvfs2-users mailing list