[Pvfs2-users] PVFS2 2.8.1 - batch_create request got: Invalid
argument
Phil Carns
carns at mcs.anl.gov
Mon Apr 6 16:37:39 EDT 2009
Hi Asterios,
Whew- I'm glad we got that figured out. I apologize for the obtuse
error message. We should at least print something more helpful in that
case, but I will check the documentation too.
-Phil
Asterios Katsifodimos wrote:
> Hello Phil,
>
> Yes, they differ!
> Name pvfs2-fs
> ID 947057450
> RootHandle 1048576
>
> Name pvfs2-fs
> ID 1529723372
> RootHandle 1048576
>
>
> I was running the pvfs2-genconfig with cssh in all of the machines...
> It works by copying the same file to all the nodes.
>
> Thanks for the pointer, the errors were really misleading...
>
> However, could we state into the documentation that the file
> has to be created once and distributed to the nodes?
>
> Thanks a lot for your quick help!
>
> best regards,
> Asterios
>
> On Mon, Apr 6, 2009 at 10:31 PM, Phil Carns <carns at mcs.anl.gov
> <mailto:carns at mcs.anl.gov>> wrote:
>
> I'm running out of places to add log messages to in the code :)
>
> I see a possible cause that I missed before, but we should be able
> to check this one without a patch. Can you do a "diff" of the two
> configuration files and see if they are different in any way? In
> particular do the ID values match?
>
>
> thanks,
> -Phil
>
> Asterios Katsifodimos wrote:
>
> No, the systems are identical :)
>
> [root at wn140 ~]# hostname
> wn140.grid.ucy.ac.cy <http://wn140.grid.ucy.ac.cy>
> <http://wn140.grid.ucy.ac.cy>
> [root at wn140 ~]# uname -a
> Linux wn140.grid.ucy.ac.cy <http://wn140.grid.ucy.ac.cy>
> <http://wn140.grid.ucy.ac.cy> 2.6.9-78.0.13.ELsmp #1 SMP Wed Jan
> 14 19:07:47 CST 2009 i686 athlon i386 GNU/Linux
>
> [root at wn140 ~]# cat /etc/redhat-release
> Scientific Linux SL release 4.7 (Beryllium)
> [root at wn140 pvfs-2.8.1]# more /proc/cpuinfo
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 15
> model : 65
> model name : Dual-Core AMD Opteron(tm) Processor 2214
> stepping : 2
> cpu MHz : 2200.000
> cache size : 1024 KB
> physical id : 0
> siblings : 2
> core id : 0
> cpu cores : 2
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 1
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep
> mtrr pge mca cmov
> pat pse36 clflush mmx fxsr sse sse2 ht pni syscall nx mmxext
> fxsr_opt rdtscp l
>
>
>
> [root at wn141 ~]# hostname
> wn141.grid.ucy.ac.cy <http://wn141.grid.ucy.ac.cy>
> <http://wn141.grid.ucy.ac.cy>
> [root at wn141 ~]# uname -a
> Linux wn141.grid.ucy.ac.cy <http://wn141.grid.ucy.ac.cy>
> <http://wn141.grid.ucy.ac.cy> 2.6.9-78.0.13.ELsmp #1 SMP Wed Jan
> 14 19:07:47 CST 2009 i686 athlon i386 GNU/Linux
>
> [root at wn141 ~]# cat /etc/redhat-release
> Scientific Linux SL release 4.7 (Beryllium)
> [root at wn141 pvfs-2.8.1]# more /proc/cpuinfo
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 15
> model : 65
> model name : Dual-Core AMD Opteron(tm) Processor 2214
> stepping : 2
> cpu MHz : 2200.000
> cache size : 1024 KB
> physical id : 0
> siblings : 2
> core id : 0
> cpu cores : 2
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 1
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep
> mtrr pge mca cmov
> pat pse36 clflush mmx fxsr sse sse2 ht pni syscall nx mmxext
> fxsr_opt rdtscp lm
>
>
> Patch applied, logs updated!
> http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
> http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
>
> thanks,
> Asterios Katsifodimos
> High Performance Computing systems Lab
> Department of Computer Science, University of Cyprus
> http://grid.ucy.ac.cy
>
>
> On Mon, Apr 6, 2009 at 10:03 PM, Phil Carns <carns at mcs.anl.gov
> <mailto:carns at mcs.anl.gov> <mailto:carns at mcs.anl.gov
> <mailto:carns at mcs.anl.gov>>> wrote:
>
> That didn't show what I expected at all. It must have hit a
> safety
> check on the request parameters. Could you try adding in the
> attached patch as well?
>
> What kind of systems are these? Are the two servers different
> architectures by any chance?
>
>
> thanks,
> -Phil
>
> Asterios Katsifodimos wrote:
>
> Thanks!
> I have applied the patch.
>
> I have replaced the old logs with the new ones. Just use the
> previous links.
> http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
> http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
>
> thanks a lot for your help,
> On Mon, Apr 6, 2009 at 8:41 PM, Phil Carns
> <carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>
> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>>>
> wrote:
>
> Thanks for posting the logs. It looks like the
> create_list
> function
> in within Trove actually generated the EINVAL error,
> but there
> aren't enough log messages in that path to know why.
>
> Any chance you could apply the patch attached to this
> email and
> retry this scenario (with verbose logging)? I'm
> hoping for some
> extra output after the line that looks like this:
>
> (0x8d4f020) batch_create (prelude sm) state: perm_check
> (status = 0)
>
>
> thanks,
> -Phil
>
>
> Asterios Katsifodimos wrote:
>
> Yes both of them. Because now both are Metadata
> servers.
> When I
> had one metadata and
> one IO server, the metadata server was not
> producing the
> errors
> until the IO server got up.
> From the time that the IO server gets up, the
> Metadata
> server
> is getting crazy...
>
> I have uploaded the log files here:
>
> http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
>
> http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
>
> have a look!
>
> thanks!
> On Mon, Apr 6, 2009 at 7:00 PM, Phil Carns
> <carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>
> <mailto:carns at mcs.anl.gov
> <mailto:carns at mcs.anl.gov> <mailto:carns at mcs.anl.gov
> <mailto:carns at mcs.anl.gov>>>
> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>
> <mailto:carns at mcs.anl.gov
> <mailto:carns at mcs.anl.gov> <mailto:carns at mcs.anl.gov
> <mailto:carns at mcs.anl.gov>>>>>
> wrote:
>
> Ok. Could you try "verbose" now as the log
> level? It is
> close to
> the "all" level but should only print information
> while the
> server
> is busy.
>
> Are both wn140 and wn141 showing the same batch
> create
> errors, or
> just one of them?
>
>
> thanks,
> -Phil
>
> Asterios Katsifodimos wrote:
>
> Hello Phil,
>
> Thanks for you answer.
> Yes I delete the storage dir every time I
> make a new
> configuration
> and I run the pvfs2-server -f command before
> starting the
> daemons.
>
> The only thing that I get from the servers
> is the
> batch_create,
> starting server, and the "PVFS2 server got
> signal 15
> (server_status_flag: 507903"
> error message. Do you want me to try on an
> other
> log level?
>
> Also, this is how the server is configured:
> ***** Displaying PVFS Configuration
> Information *****
>
> ------------------------------------------------------
> PVFS2 configured to build karma gui
> : no
> PVFS2 configured to perform coverage
> analysis : no
> PVFS2 configured for aio threaded callbacks
> : yes
> PVFS2 configured to use FUSE
> : no
> PVFS2 configured for the 2.6.x kernel
> module : no
> PVFS2 configured for the 2.4.x kernel
> module : no
> PVFS2 configured for using the
> mmap-ra-cache : no
> PVFS2 will use workaround for redhat 2.4
> kernels
> : no
> PVFS2 will use workaround for buggy NPTL
> : no
> PVFS2 server will be built
> : yes
>
> PVFS2 version string: 2.8.1
>
>
> thanks again,
> On Mon, Apr 6, 2009 at 5:21 PM, Phil Carns
> <carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>
> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>>
> <mailto:carns at mcs.anl.gov
> <mailto:carns at mcs.anl.gov>
> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>
> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>>>
> <mailto:carns at mcs.anl.gov
> <mailto:carns at mcs.anl.gov> <mailto:carns at mcs.anl.gov
> <mailto:carns at mcs.anl.gov>>
> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>>
>
> <mailto:carns at mcs.anl.gov
> <mailto:carns at mcs.anl.gov>
> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>
> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>>>>>
>
> wrote:
>
> Hello,
>
> I'm not sure what would cause that "Invalid
> argument"
> error.
>
> Could you try the following steps:
>
> - kill both servers
> - modify your configuration files to set
> "EventLogging" to "none"
> - delete your old log files (or move them to
> another
> directory)
> - start the servers
>
> You can then send us the complete
> contents of
> both log
> files
> and we
> can go from there. The "all" level is a
> little
> hard
> to interpret
> because it generates a lot of
> information even when
> servers
> are idle.
>
> Also, when you went from one server to
> two, did
> you delete
> your old
> storage space (/pvfs) and start over, or are
> you trying to
> keep that
> data and add servers to it?
>
> thanks!
> -Phil
>
> Asterios Katsifodimos wrote:
>
> Hello all,
>
> I have been trying to install PVFS
> 2.8.1 on
> Ubuntu
> server,
> Centos4 and
> Scientific Linux 4. I compile it and can
> run it on
> a "single
> host" configuration
> without any problems.
>
> However, when I add more nodes to the
> configuration(always using the
> pvfs2-genconfig defaults ) I have
> the following
> problem:
>
> *On the metadata node I get these
> messages:*
> [E 04/02 20:16] batch_create request
> got:
> Invalid
> argument
> [E 04/02 20:16] batch_create request
> got:
> Invalid
> argument
> [E 04/02 20:16] batch_create request
> got:
> Invalid
> argument
> [E 04/02 20:16] batch_create request
> got:
> Invalid
> argument
>
>
> *In the IO nodes I get:*
> [root at wn140 ~]# tail -50
> /tmp/pvfs2-server.log
> [D 04/02 23:53] BMI_testcontext
> completing:
> 18446744072456767880
> [D 04/02 23:53] [SM Entering]:
> (0x88f8b00)
> msgpairarray_sm:complete (status: 1)
> [D 04/02 23:53] [SM frame get]:
> (0x88f8b00)
> op-id: 37
> index: 0
> base-frm: 1
> [D 04/02 23:53]
> msgpairarray_complete: sm
> 0x88f8b00
> status_user_tag 1 msgarray_count 1
> [D 04/02 23:53] msgpairarray: 1
> operations remain
> [D 04/02 23:53] [SM Exiting]:
> (0x88f8b00)
> msgpairarray_sm:complete (error code:
> -1073742006), (action:
> DEFERRED)
> [D 04/02 23:53] [SM Entering]:
> (0x88f8b00)
> msgpairarray_sm:complete (status: 0)
> [D 04/02 23:53] [SM frame get]:
> (0x88f8b00)
> op-id: 37
> index: 0
> base-frm: 1
> [D 04/02 23:53]
> msgpairarray_complete: sm
> 0x88f8b00
> status_user_tag 0 msgarray_count 1
> [D 04/02 23:53] msgpairarray: all
> operations
> complete
> [D 04/02 23:53] [SM Exiting]:
> (0x88f8b00)
> msgpairarray_sm:complete (error
> code: 190),
> (action:
> COMPLETE)
> [D 04/02 23:53] [SM Entering]:
> (0x88f8b00)
> msgpairarray_sm:completion_fn
> (status: 0)
> [D 04/02 23:53] [SM frame get]:
> (0x88f8b00)
> op-id: 37
> index: 0
> base-frm: 1
> [D 04/02 23:53] (0x88f8b00)
> msgpairarray state:
> completion_fn
> [E 04/02 23:53] Warning: msgpair
> failed to
> tcp://wn141:3334,
> will retry: Connection refused
> [D 04/02 23:53] ***
> msgpairarray_completion_fn:
> msgpair 0
> failed, retry 1
> [D 04/02 23:53] ***
> msgpairarray_completion_fn:
> msgpair
> retrying
> after delay.
> [D 04/02 23:53] [SM Exiting]:
> (0x88f8b00)
> msgpairarray_sm:completion_fn (error
> code:
> 191),
> (action:
> COMPLETE)
> [D 04/02 23:53] [SM Entering]:
> (0x88f8b00)
> msgpairarray_sm:post_retry (status: 0)
> [D 04/02 23:53] [SM frame get]:
> (0x88f8b00)
> op-id: 37
> index: 0
> base-frm: 1
> [D 04/02 23:53]
> msgpairarray_post_retry: sm
> 0x88f8b00,
> wait 2000 ms
> [D 04/02 23:53] [SM Exiting]:
> (0x88f8b00)
> msgpairarray_sm:post_retry (error
> code: 0),
> (action:
> DEFERRED)
> [D 04/02 23:53] [SM Entering]:
> (0x89476c0)
> perf_update_sm:do_work (status: 0)
> [P 04/02 23:53] Start times
> (hr:min:sec):
> 23:53:11.330
> 23:53:10.310 23:53:09.287
> 23:53:08.268
> 23:53:07.245
> 23:53:06.225
> [P 04/02 23:53] Intervals
> (hr:min:sec) :
> 00:00:01.026
> 00:00:01.020 00:00:01.023
> 00:00:01.019
> 00:00:01.023
> 00:00:01.020
> [P 04/02 23:53]
>
> -------------------------------------------------------------------------------------------------------------
> [P 04/02 23:53] bytes read
> : 0 0
> 0 0 0
> 0
> [P 04/02 23:53] bytes written
> : 0 0
> 0 0 0
> 0
> [P 04/02 23:53] metadata reads
> : 0 0
> 0 0 0
> 0
> [P 04/02 23:53] metadata writes
> : 0 0
> 0 0 0
> 0
> [P 04/02 23:53] metadata dspace ops
> : 0 0
> 0 0 0
> 0
> [P 04/02 23:53] metadata keyval ops
> : 1 1
> 1 1 1
> 1
> [P 04/02 23:53] request scheduler
> : 0 0
> 0 0 0
> 0
> [D 04/02 23:53] [SM Exiting]:
> (0x89476c0)
> perf_update_sm:do_work
> (error code: 0), (action: DEFERRED)
> [D 04/02 23:53] [SM Entering]:
> (0x8948810)
> job_timer_sm:do_work
> (status: 0)
> [D 04/02 23:53] [SM Exiting]:
> (0x8948810)
> job_timer_sm:do_work
> (error code: 0), (action: DEFERRED)
> [D 04/02 23:53] [SM Entering]:
> (0x89476c0)
> perf_update_sm:do_work (status: 0)
> [P 04/02 23:53] Start times
> (hr:min:sec):
> 23:53:12.356
> 23:53:11.330 23:53:10.310
> 23:53:09.287
> 23:53:08.268
> 23:53:07.245
> [P 04/02 23:53] Intervals
> (hr:min:sec) :
> 00:00:01.020
> 00:00:01.026 00:00:01.020
> 00:00:01.023
> 00:00:01.019
> 00:00:01.023
> [P 04/02 23:53]
>
> -------------------------------------------------------------------------------------------------------------
> [P 04/02 23:53] bytes read
> : 0 0
> 0 0 0
> 0
> [P 04/02 23:53] bytes written
> : 0 0
> 0 0 0
> 0
> [P 04/02 23:53] metadata reads
> : 0 0
> 0 0 0
> 0
> [P 04/02 23:53] metadata writes
> : 0 0
> 0 0 0
> 0
> [P 04/02 23:53] metadata dspace ops
> : 0 0
> 0 0 0
> 0
> [P 04/02 23:53] metadata keyval ops
> : 1 1
> 1 1 1
> 1
> [P 04/02 23:53] request scheduler
> : 0 0
> 0 0 0
> 0
> [D 04/02 23:53] [SM Exiting]:
> (0x89476c0)
> perf_update_sm:do_work
> (error code: 0), (action: DEFERRED)
> [D 04/02 23:53] [SM Entering]:
> (0x8948810)
> job_timer_sm:do_work
> (status: 0)
> [D 04/02 23:53] [SM Exiting]:
> (0x8948810)
> job_timer_sm:do_work
> (error code: 0), (action: DEFERRED)
>
>
> The metadata node keeps asking for
> something that
> the IO
> nodes
> cannot give
> the right way. So it complains. This
> makes the
> nodes and the
> metadata node
> not to work.
>
> I have installed those services many
> times.
> I have
> tested
> this
> using berkeley
> db 4.2 and 4.3 on Redhat systems(centos,
> scientific
> linnux) and
> on Ubuntu server.
>
> I have also tried the PVFS version 2.6.3
> and I get the
> same problem.
>
> *My config files look like:*
> [root at wn140 ~]# more /etc/pvfs2-fs.conf
> <Defaults>
> UnexpectedRequests 50
> EventLogging all
> EnableTracing no
> LogStamp datetime
> BMIModules bmi_tcp
> FlowModules flowproto_multiqueue
> PerfUpdateInterval 1000
> ServerJobBMITimeoutSecs 30
> ServerJobFlowTimeoutSecs 30
> ClientJobBMITimeoutSecs 300
> ClientJobFlowTimeoutSecs 300
> ClientRetryLimit 5
> ClientRetryDelayMilliSecs 2000
> PrecreateBatchSize 512
> PrecreateLowThreshold 256
>
> StorageSpace /pvfs
> LogFile /tmp/pvfs2-server.log
> </Defaults>
>
> <Aliases>
> Alias wn140 tcp://wn140:3334
> Alias wn141 tcp://wn141:3334
> </Aliases>
>
> <Filesystem>
> Name pvfs2-fs
> ID 320870944
> RootHandle 1048576
> FileStuffing yes
> <MetaHandleRanges>
> Range wn140 3-2305843009213693953
> Range wn141
> 2305843009213693954-4611686018427387904
> </MetaHandleRanges>
> <DataHandleRanges>
> Range wn140
> 4611686018427387905-6917529027641081855
> Range wn141
> 6917529027641081856-9223372036854775806
> </DataHandleRanges>
> <StorageHints>
> TroveSyncMeta yes
> TroveSyncData no
> TroveMethod alt-aio
> </StorageHints>
> </Filesystem>
>
>
> My setup is made from two nodes that are
> both IO
> and Metadata
> nodes. I have also tried
> a 4 node setup with 2I/O - 2 MD nodes
> resulting in the
> same thing.
>
> Any suggestions?
>
> thank you in advance,
> --
> Asterios Katsifodimos
> High Performance Computing systems Lab
> Department of Computer Science,
> University
> of Cyprus
> http://www.asteriosk.gr
> <http://www.asteriosk.gr/>
>
>
>
> ------------------------------------------------------------------------
>
>
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users at beowulf-underground.org
> <mailto:Pvfs2-users at beowulf-underground.org>
> <mailto:Pvfs2-users at beowulf-underground.org
> <mailto:Pvfs2-users at beowulf-underground.org>>
> <mailto:Pvfs2-users at beowulf-underground.org
> <mailto:Pvfs2-users at beowulf-underground.org>
> <mailto:Pvfs2-users at beowulf-underground.org
> <mailto:Pvfs2-users at beowulf-underground.org>>>
> <mailto:Pvfs2-users at beowulf-underground.org
> <mailto:Pvfs2-users at beowulf-underground.org>
> <mailto:Pvfs2-users at beowulf-underground.org
> <mailto:Pvfs2-users at beowulf-underground.org>>
> <mailto:Pvfs2-users at beowulf-underground.org
> <mailto:Pvfs2-users at beowulf-underground.org>
> <mailto:Pvfs2-users at beowulf-underground.org
> <mailto:Pvfs2-users at beowulf-underground.org>>>>
>
> <mailto:Pvfs2-users at beowulf-underground.org
> <mailto:Pvfs2-users at beowulf-underground.org>
> <mailto:Pvfs2-users at beowulf-underground.org
> <mailto:Pvfs2-users at beowulf-underground.org>>
> <mailto:Pvfs2-users at beowulf-underground.org
> <mailto:Pvfs2-users at beowulf-underground.org>
> <mailto:Pvfs2-users at beowulf-underground.org
> <mailto:Pvfs2-users at beowulf-underground.org>>>
> <mailto:Pvfs2-users at beowulf-underground.org
> <mailto:Pvfs2-users at beowulf-underground.org>
> <mailto:Pvfs2-users at beowulf-underground.org
> <mailto:Pvfs2-users at beowulf-underground.org>>
> <mailto:Pvfs2-users at beowulf-underground.org
> <mailto:Pvfs2-users at beowulf-underground.org>
> <mailto:Pvfs2-users at beowulf-underground.org
> <mailto:Pvfs2-users at beowulf-underground.org>>>>>
>
>
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>
>
>
>
>
>
>
>
>
>
More information about the Pvfs2-users
mailing list