[Pvfs2-users] PVFS2 2.8.1 - batch_create request got: Invalid
argument
Asterios Katsifodimos
asteriosk at gmail.com
Mon Apr 6 15:49:09 EDT 2009
Hello Phil,
Yes, they differ!
Name pvfs2-fs
ID 947057450
RootHandle 1048576
Name pvfs2-fs
ID 1529723372
RootHandle 1048576
I was running the pvfs2-genconfig with cssh in all of the machines...
It works by copying the same file to all the nodes.
Thanks for the pointer, the errors were really misleading...
However, could we state into the documentation that the file
has to be created once and distributed to the nodes?
Thanks a lot for your quick help!
best regards,
Asterios
On Mon, Apr 6, 2009 at 10:31 PM, Phil Carns <carns at mcs.anl.gov> wrote:
> I'm running out of places to add log messages to in the code :)
>
> I see a possible cause that I missed before, but we should be able to check
> this one without a patch. Can you do a "diff" of the two configuration
> files and see if they are different in any way? In particular do the ID
> values match?
>
> thanks,
> -Phil
>
> Asterios Katsifodimos wrote:
>
>> No, the systems are identical :)
>>
>> [root at wn140 ~]# hostname
>> wn140.grid.ucy.ac.cy <http://wn140.grid.ucy.ac.cy>
>> [root at wn140 ~]# uname -a
>> Linux wn140.grid.ucy.ac.cy <http://wn140.grid.ucy.ac.cy>
>> 2.6.9-78.0.13.ELsmp #1 SMP Wed Jan 14 19:07:47 CST 2009 i686 athlon i386
>> GNU/Linux
>>
>> [root at wn140 ~]# cat /etc/redhat-release
>> Scientific Linux SL release 4.7 (Beryllium)
>> [root at wn140 pvfs-2.8.1]# more /proc/cpuinfo
>> processor : 0
>> vendor_id : AuthenticAMD
>> cpu family : 15
>> model : 65
>> model name : Dual-Core AMD Opteron(tm) Processor 2214
>> stepping : 2
>> cpu MHz : 2200.000
>> cache size : 1024 KB
>> physical id : 0
>> siblings : 2
>> core id : 0
>> cpu cores : 2
>> fdiv_bug : no
>> hlt_bug : no
>> f00f_bug : no
>> coma_bug : no
>> fpu : yes
>> fpu_exception : yes
>> cpuid level : 1
>> wp : yes
>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>> cmov
>> pat pse36 clflush mmx fxsr sse sse2 ht pni syscall nx mmxext fxsr_opt
>> rdtscp l
>>
>>
>>
>> [root at wn141 ~]# hostname
>> wn141.grid.ucy.ac.cy <http://wn141.grid.ucy.ac.cy>
>> [root at wn141 ~]# uname -a
>> Linux wn141.grid.ucy.ac.cy <http://wn141.grid.ucy.ac.cy>
>> 2.6.9-78.0.13.ELsmp #1 SMP Wed Jan 14 19:07:47 CST 2009 i686 athlon i386
>> GNU/Linux
>>
>> [root at wn141 ~]# cat /etc/redhat-release
>> Scientific Linux SL release 4.7 (Beryllium)
>> [root at wn141 pvfs-2.8.1]# more /proc/cpuinfo
>> processor : 0
>> vendor_id : AuthenticAMD
>> cpu family : 15
>> model : 65
>> model name : Dual-Core AMD Opteron(tm) Processor 2214
>> stepping : 2
>> cpu MHz : 2200.000
>> cache size : 1024 KB
>> physical id : 0
>> siblings : 2
>> core id : 0
>> cpu cores : 2
>> fdiv_bug : no
>> hlt_bug : no
>> f00f_bug : no
>> coma_bug : no
>> fpu : yes
>> fpu_exception : yes
>> cpuid level : 1
>> wp : yes
>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>> cmov
>> pat pse36 clflush mmx fxsr sse sse2 ht pni syscall nx mmxext fxsr_opt
>> rdtscp lm
>>
>>
>> Patch applied, logs updated!
>> http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
>> http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
>>
>> thanks,
>> Asterios Katsifodimos
>> High Performance Computing systems Lab
>> Department of Computer Science, University of Cyprus
>> http://grid.ucy.ac.cy
>>
>>
>> On Mon, Apr 6, 2009 at 10:03 PM, Phil Carns <carns at mcs.anl.gov <mailto:
>> carns at mcs.anl.gov>> wrote:
>>
>> That didn't show what I expected at all. It must have hit a safety
>> check on the request parameters. Could you try adding in the
>> attached patch as well?
>>
>> What kind of systems are these? Are the two servers different
>> architectures by any chance?
>>
>>
>> thanks,
>> -Phil
>>
>> Asterios Katsifodimos wrote:
>>
>> Thanks!
>> I have applied the patch.
>>
>> I have replaced the old logs with the new ones. Just use the
>> previous links.
>> http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
>> http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
>>
>> thanks a lot for your help,
>> On Mon, Apr 6, 2009 at 8:41 PM, Phil Carns <carns at mcs.anl.gov
>> <mailto:carns at mcs.anl.gov> <mailto:carns at mcs.anl.gov
>> <mailto:carns at mcs.anl.gov>>> wrote:
>>
>> Thanks for posting the logs. It looks like the create_list
>> function
>> in within Trove actually generated the EINVAL error, but there
>> aren't enough log messages in that path to know why.
>>
>> Any chance you could apply the patch attached to this email and
>> retry this scenario (with verbose logging)? I'm hoping for some
>> extra output after the line that looks like this:
>>
>> (0x8d4f020) batch_create (prelude sm) state: perm_check
>> (status = 0)
>>
>>
>> thanks,
>> -Phil
>>
>>
>> Asterios Katsifodimos wrote:
>>
>> Yes both of them. Because now both are Metadata servers.
>> When I
>> had one metadata and
>> one IO server, the metadata server was not producing the
>> errors
>> until the IO server got up.
>> From the time that the IO server gets up, the Metadata
>> server
>> is getting crazy...
>>
>> I have uploaded the log files here:
>> http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
>> http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
>>
>> have a look!
>>
>> thanks!
>> On Mon, Apr 6, 2009 at 7:00 PM, Phil Carns
>> <carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
>> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>
>> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
>> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>>>
>> wrote:
>>
>> Ok. Could you try "verbose" now as the log level? It is
>> close to
>> the "all" level but should only print information
>> while the
>> server
>> is busy.
>>
>> Are both wn140 and wn141 showing the same batch create
>> errors, or
>> just one of them?
>>
>>
>> thanks,
>> -Phil
>>
>> Asterios Katsifodimos wrote:
>>
>> Hello Phil,
>>
>> Thanks for you answer.
>> Yes I delete the storage dir every time I make a new
>> configuration
>> and I run the pvfs2-server -f command before
>> starting the
>> daemons.
>>
>> The only thing that I get from the servers is the
>> batch_create,
>> starting server, and the "PVFS2 server got signal 15
>> (server_status_flag: 507903"
>> error message. Do you want me to try on an other
>> log level?
>>
>> Also, this is how the server is configured:
>> ***** Displaying PVFS Configuration Information *****
>>
>> ------------------------------------------------------
>> PVFS2 configured to build karma gui
>> : no
>> PVFS2 configured to perform coverage analysis
>> : no
>> PVFS2 configured for aio threaded callbacks
>> : yes
>> PVFS2 configured to use FUSE
>> : no
>> PVFS2 configured for the 2.6.x kernel module
>> : no
>> PVFS2 configured for the 2.4.x kernel module
>> : no
>> PVFS2 configured for using the mmap-ra-cache
>> : no
>> PVFS2 will use workaround for redhat 2.4 kernels
>> : no
>> PVFS2 will use workaround for buggy NPTL
>> : no
>> PVFS2 server will be built
>> : yes
>>
>> PVFS2 version string: 2.8.1
>>
>>
>> thanks again,
>> On Mon, Apr 6, 2009 at 5:21 PM, Phil Carns
>> <carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
>> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>
>> <mailto:carns at mcs.anl.gov
>> <mailto:carns at mcs.anl.gov> <mailto:carns at mcs.anl.gov
>> <mailto:carns at mcs.anl.gov>>>
>> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
>> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>
>>
>> <mailto:carns at mcs.anl.gov
>> <mailto:carns at mcs.anl.gov> <mailto:carns at mcs.anl.gov
>> <mailto:carns at mcs.anl.gov>>>>>
>>
>> wrote:
>>
>> Hello,
>>
>> I'm not sure what would cause that "Invalid
>> argument"
>> error.
>>
>> Could you try the following steps:
>>
>> - kill both servers
>> - modify your configuration files to set
>> "EventLogging" to "none"
>> - delete your old log files (or move them to
>> another
>> directory)
>> - start the servers
>>
>> You can then send us the complete contents of
>> both log
>> files
>> and we
>> can go from there. The "all" level is a little
>> hard
>> to interpret
>> because it generates a lot of information even
>> when
>> servers
>> are idle.
>>
>> Also, when you went from one server to two, did
>> you delete
>> your old
>> storage space (/pvfs) and start over, or are
>> you trying to
>> keep that
>> data and add servers to it?
>>
>> thanks!
>> -Phil
>>
>> Asterios Katsifodimos wrote:
>>
>> Hello all,
>>
>> I have been trying to install PVFS 2.8.1 on
>> Ubuntu
>> server,
>> Centos4 and
>> Scientific Linux 4. I compile it and can
>> run it on
>> a "single
>> host" configuration
>> without any problems.
>>
>> However, when I add more nodes to the
>> configuration(always using the
>> pvfs2-genconfig defaults ) I have the
>> following
>> problem:
>>
>> *On the metadata node I get these messages:*
>> [E 04/02 20:16] batch_create request got:
>> Invalid
>> argument
>> [E 04/02 20:16] batch_create request got:
>> Invalid
>> argument
>> [E 04/02 20:16] batch_create request got:
>> Invalid
>> argument
>> [E 04/02 20:16] batch_create request got:
>> Invalid
>> argument
>>
>>
>> *In the IO nodes I get:*
>> [root at wn140 ~]# tail -50
>> /tmp/pvfs2-server.log
>> [D 04/02 23:53] BMI_testcontext completing:
>> 18446744072456767880
>> [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>> msgpairarray_sm:complete (status: 1)
>> [D 04/02 23:53] [SM frame get]: (0x88f8b00)
>> op-id: 37
>> index: 0
>> base-frm: 1
>> [D 04/02 23:53] msgpairarray_complete: sm
>> 0x88f8b00
>> status_user_tag 1 msgarray_count 1
>> [D 04/02 23:53] msgpairarray: 1
>> operations remain
>> [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>> msgpairarray_sm:complete (error code:
>> -1073742006), (action:
>> DEFERRED)
>> [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>> msgpairarray_sm:complete (status: 0)
>> [D 04/02 23:53] [SM frame get]: (0x88f8b00)
>> op-id: 37
>> index: 0
>> base-frm: 1
>> [D 04/02 23:53] msgpairarray_complete: sm
>> 0x88f8b00
>> status_user_tag 0 msgarray_count 1
>> [D 04/02 23:53] msgpairarray: all operations
>> complete
>> [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>> msgpairarray_sm:complete (error code: 190),
>> (action:
>> COMPLETE)
>> [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>> msgpairarray_sm:completion_fn (status: 0)
>> [D 04/02 23:53] [SM frame get]: (0x88f8b00)
>> op-id: 37
>> index: 0
>> base-frm: 1
>> [D 04/02 23:53] (0x88f8b00) msgpairarray
>> state:
>> completion_fn
>> [E 04/02 23:53] Warning: msgpair failed to
>> tcp://wn141:3334,
>> will retry: Connection refused
>> [D 04/02 23:53] ***
>> msgpairarray_completion_fn:
>> msgpair 0
>> failed, retry 1
>> [D 04/02 23:53] ***
>> msgpairarray_completion_fn:
>> msgpair
>> retrying
>> after delay.
>> [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>> msgpairarray_sm:completion_fn (error code:
>> 191),
>> (action:
>> COMPLETE)
>> [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>> msgpairarray_sm:post_retry (status: 0)
>> [D 04/02 23:53] [SM frame get]: (0x88f8b00)
>> op-id: 37
>> index: 0
>> base-frm: 1
>> [D 04/02 23:53] msgpairarray_post_retry: sm
>> 0x88f8b00,
>> wait 2000 ms
>> [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>> msgpairarray_sm:post_retry (error code: 0),
>> (action:
>> DEFERRED)
>> [D 04/02 23:53] [SM Entering]: (0x89476c0)
>> perf_update_sm:do_work (status: 0)
>> [P 04/02 23:53] Start times (hr:min:sec):
>> 23:53:11.330
>> 23:53:10.310 23:53:09.287 23:53:08.268
>> 23:53:07.245
>> 23:53:06.225
>> [P 04/02 23:53] Intervals (hr:min:sec) :
>> 00:00:01.026
>> 00:00:01.020 00:00:01.023 00:00:01.019
>> 00:00:01.023
>> 00:00:01.020
>> [P 04/02 23:53]
>>
>> -------------------------------------------------------------------------------------------------------------
>> [P 04/02 23:53] bytes read :
>> 0 0 0
>> 0 0 0
>> [P 04/02 23:53] bytes written :
>> 0 0 0
>> 0 0 0
>> [P 04/02 23:53] metadata reads :
>> 0 0 0
>> 0 0 0
>> [P 04/02 23:53] metadata writes :
>> 0 0 0
>> 0 0 0
>> [P 04/02 23:53] metadata dspace ops :
>> 0 0 0
>> 0 0 0
>> [P 04/02 23:53] metadata keyval ops :
>> 1 1 1
>> 1 1 1
>> [P 04/02 23:53] request scheduler :
>> 0 0 0
>> 0 0 0
>> [D 04/02 23:53] [SM Exiting]: (0x89476c0)
>> perf_update_sm:do_work
>> (error code: 0), (action: DEFERRED)
>> [D 04/02 23:53] [SM Entering]: (0x8948810)
>> job_timer_sm:do_work
>> (status: 0)
>> [D 04/02 23:53] [SM Exiting]: (0x8948810)
>> job_timer_sm:do_work
>> (error code: 0), (action: DEFERRED)
>> [D 04/02 23:53] [SM Entering]: (0x89476c0)
>> perf_update_sm:do_work (status: 0)
>> [P 04/02 23:53] Start times (hr:min:sec):
>> 23:53:12.356
>> 23:53:11.330 23:53:10.310 23:53:09.287
>> 23:53:08.268
>> 23:53:07.245
>> [P 04/02 23:53] Intervals (hr:min:sec) :
>> 00:00:01.020
>> 00:00:01.026 00:00:01.020 00:00:01.023
>> 00:00:01.019
>> 00:00:01.023
>> [P 04/02 23:53]
>>
>> -------------------------------------------------------------------------------------------------------------
>> [P 04/02 23:53] bytes read :
>> 0 0 0
>> 0 0 0
>> [P 04/02 23:53] bytes written :
>> 0 0 0
>> 0 0 0
>> [P 04/02 23:53] metadata reads :
>> 0 0 0
>> 0 0 0
>> [P 04/02 23:53] metadata writes :
>> 0 0 0
>> 0 0 0
>> [P 04/02 23:53] metadata dspace ops :
>> 0 0 0
>> 0 0 0
>> [P 04/02 23:53] metadata keyval ops :
>> 1 1 1
>> 1 1 1
>> [P 04/02 23:53] request scheduler :
>> 0 0 0
>> 0 0 0
>> [D 04/02 23:53] [SM Exiting]: (0x89476c0)
>> perf_update_sm:do_work
>> (error code: 0), (action: DEFERRED)
>> [D 04/02 23:53] [SM Entering]: (0x8948810)
>> job_timer_sm:do_work
>> (status: 0)
>> [D 04/02 23:53] [SM Exiting]: (0x8948810)
>> job_timer_sm:do_work
>> (error code: 0), (action: DEFERRED)
>>
>>
>> The metadata node keeps asking for
>> something that
>> the IO
>> nodes
>> cannot give
>> the right way. So it complains. This makes the
>> nodes and the
>> metadata node
>> not to work.
>>
>> I have installed those services many times.
>> I have
>> tested
>> this
>> using berkeley
>> db 4.2 and 4.3 on Redhat systems(centos,
>> scientific
>> linnux) and
>> on Ubuntu server.
>>
>> I have also tried the PVFS version 2.6.3
>> and I get the
>> same problem.
>>
>> *My config files look like:*
>> [root at wn140 ~]# more /etc/pvfs2-fs.conf
>> <Defaults>
>> UnexpectedRequests 50
>> EventLogging all
>> EnableTracing no
>> LogStamp datetime
>> BMIModules bmi_tcp
>> FlowModules flowproto_multiqueue
>> PerfUpdateInterval 1000
>> ServerJobBMITimeoutSecs 30
>> ServerJobFlowTimeoutSecs 30
>> ClientJobBMITimeoutSecs 300
>> ClientJobFlowTimeoutSecs 300
>> ClientRetryLimit 5
>> ClientRetryDelayMilliSecs 2000
>> PrecreateBatchSize 512
>> PrecreateLowThreshold 256
>>
>> StorageSpace /pvfs
>> LogFile /tmp/pvfs2-server.log
>> </Defaults>
>>
>> <Aliases>
>> Alias wn140 tcp://wn140:3334
>> Alias wn141 tcp://wn141:3334
>> </Aliases>
>>
>> <Filesystem>
>> Name pvfs2-fs
>> ID 320870944
>> RootHandle 1048576
>> FileStuffing yes
>> <MetaHandleRanges>
>> Range wn140 3-2305843009213693953
>> Range wn141
>> 2305843009213693954-4611686018427387904
>> </MetaHandleRanges>
>> <DataHandleRanges>
>> Range wn140
>> 4611686018427387905-6917529027641081855
>> Range wn141
>> 6917529027641081856-9223372036854775806
>> </DataHandleRanges>
>> <StorageHints>
>> TroveSyncMeta yes
>> TroveSyncData no
>> TroveMethod alt-aio
>> </StorageHints>
>> </Filesystem>
>>
>>
>> My setup is made from two nodes that are
>> both IO
>> and Metadata
>> nodes. I have also tried
>> a 4 node setup with 2I/O - 2 MD nodes
>> resulting in the
>> same thing.
>>
>> Any suggestions?
>>
>> thank you in advance,
>> --
>> Asterios Katsifodimos
>> High Performance Computing systems Lab
>> Department of Computer Science, University
>> of Cyprus
>> http://www.asteriosk.gr
>> <http://www.asteriosk.gr/>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> _______________________________________________
>> Pvfs2-users mailing list
>> Pvfs2-users at beowulf-underground.org
>> <mailto:Pvfs2-users at beowulf-underground.org>
>> <mailto:Pvfs2-users at beowulf-underground.org
>> <mailto:Pvfs2-users at beowulf-underground.org>>
>> <mailto:Pvfs2-users at beowulf-underground.org
>> <mailto:Pvfs2-users at beowulf-underground.org>
>> <mailto:Pvfs2-users at beowulf-underground.org
>> <mailto:Pvfs2-users at beowulf-underground.org>>>
>> <mailto:Pvfs2-users at beowulf-underground.org
>> <mailto:Pvfs2-users at beowulf-underground.org>
>> <mailto:Pvfs2-users at beowulf-underground.org
>> <mailto:Pvfs2-users at beowulf-underground.org>>
>> <mailto:Pvfs2-users at beowulf-underground.org
>> <mailto:Pvfs2-users at beowulf-underground.org>
>> <mailto:Pvfs2-users at beowulf-underground.org
>> <mailto:Pvfs2-users at beowulf-underground.org>>>>
>>
>>
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.beowulf-underground.org/pipermail/pvfs2-users/attachments/20090406/28fe270e/attachment-0001.htm
More information about the Pvfs2-users
mailing list