[Pvfs2-users] PVFS2 2.8.1 - batch_create request got: Invalid
argument
Asterios Katsifodimos
asteriosk at gmail.com
Mon Apr 6 15:17:17 EDT 2009
No, the systems are identical :)
[root at wn140 ~]# hostname
wn140.grid.ucy.ac.cy
[root at wn140 ~]# uname -a
Linux wn140.grid.ucy.ac.cy 2.6.9-78.0.13.ELsmp #1 SMP Wed Jan 14 19:07:47
CST 2009 i686 athlon i386 GNU/Linux
[root at wn140 ~]# cat /etc/redhat-release
Scientific Linux SL release 4.7 (Beryllium)
[root at wn140 pvfs-2.8.1]# more /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 2214
stepping : 2
cpu MHz : 2200.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov
pat pse36 clflush mmx fxsr sse sse2 ht pni syscall nx mmxext fxsr_opt rdtscp
l
[root at wn141 ~]# hostname
wn141.grid.ucy.ac.cy
[root at wn141 ~]# uname -a
Linux wn141.grid.ucy.ac.cy 2.6.9-78.0.13.ELsmp #1 SMP Wed Jan 14 19:07:47
CST 2009 i686 athlon i386 GNU/Linux
[root at wn141 ~]# cat /etc/redhat-release
Scientific Linux SL release 4.7 (Beryllium)
[root at wn141 pvfs-2.8.1]# more /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 2214
stepping : 2
cpu MHz : 2200.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov
pat pse36 clflush mmx fxsr sse sse2 ht pni syscall nx mmxext fxsr_opt rdtscp
lm
Patch applied, logs updated!
http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
thanks,
Asterios Katsifodimos
High Performance Computing systems Lab
Department of Computer Science, University of Cyprus
http://grid.ucy.ac.cy
On Mon, Apr 6, 2009 at 10:03 PM, Phil Carns <carns at mcs.anl.gov> wrote:
> That didn't show what I expected at all. It must have hit a safety check
> on the request parameters. Could you try adding in the attached patch as
> well?
>
> What kind of systems are these? Are the two servers different
> architectures by any chance?
>
> thanks,
> -Phil
>
> Asterios Katsifodimos wrote:
>
>> Thanks!
>> I have applied the patch.
>>
>> I have replaced the old logs with the new ones. Just use the previous
>> links.
>> http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
>> http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
>>
>> thanks a lot for your help,
>> On Mon, Apr 6, 2009 at 8:41 PM, Phil Carns <carns at mcs.anl.gov <mailto:
>> carns at mcs.anl.gov>> wrote:
>>
>> Thanks for posting the logs. It looks like the create_list function
>> in within Trove actually generated the EINVAL error, but there
>> aren't enough log messages in that path to know why.
>>
>> Any chance you could apply the patch attached to this email and
>> retry this scenario (with verbose logging)? I'm hoping for some
>> extra output after the line that looks like this:
>>
>> (0x8d4f020) batch_create (prelude sm) state: perm_check (status = 0)
>>
>>
>> thanks,
>> -Phil
>>
>>
>> Asterios Katsifodimos wrote:
>>
>> Yes both of them. Because now both are Metadata servers. When I
>> had one metadata and
>> one IO server, the metadata server was not producing the errors
>> until the IO server got up.
>> From the time that the IO server gets up, the Metadata server
>> is getting crazy...
>>
>> I have uploaded the log files here:
>> http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
>> http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
>>
>> have a look!
>>
>> thanks!
>> On Mon, Apr 6, 2009 at 7:00 PM, Phil Carns <carns at mcs.anl.gov
>> <mailto:carns at mcs.anl.gov> <mailto:carns at mcs.anl.gov
>> <mailto:carns at mcs.anl.gov>>> wrote:
>>
>> Ok. Could you try "verbose" now as the log level? It is
>> close to
>> the "all" level but should only print information while the
>> server
>> is busy.
>>
>> Are both wn140 and wn141 showing the same batch create errors,
>> or
>> just one of them?
>>
>>
>> thanks,
>> -Phil
>>
>> Asterios Katsifodimos wrote:
>>
>> Hello Phil,
>>
>> Thanks for you answer.
>> Yes I delete the storage dir every time I make a new
>> configuration
>> and I run the pvfs2-server -f command before starting the
>> daemons.
>>
>> The only thing that I get from the servers is the
>> batch_create,
>> starting server, and the "PVFS2 server got signal 15
>> (server_status_flag: 507903"
>> error message. Do you want me to try on an other log level?
>>
>> Also, this is how the server is configured:
>> ***** Displaying PVFS Configuration Information *****
>> ------------------------------------------------------
>> PVFS2 configured to build karma gui : no
>> PVFS2 configured to perform coverage analysis : no
>> PVFS2 configured for aio threaded callbacks : yes
>> PVFS2 configured to use FUSE : no
>> PVFS2 configured for the 2.6.x kernel module : no
>> PVFS2 configured for the 2.4.x kernel module : no
>> PVFS2 configured for using the mmap-ra-cache : no
>> PVFS2 will use workaround for redhat 2.4 kernels : no
>> PVFS2 will use workaround for buggy NPTL : no
>> PVFS2 server will be built : yes
>>
>> PVFS2 version string: 2.8.1
>>
>>
>> thanks again,
>> On Mon, Apr 6, 2009 at 5:21 PM, Phil Carns
>> <carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
>> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>
>> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>
>>
>> <mailto:carns at mcs.anl.gov <mailto:carns at mcs.anl.gov>>>>
>>
>> wrote:
>>
>> Hello,
>>
>> I'm not sure what would cause that "Invalid argument"
>> error.
>>
>> Could you try the following steps:
>>
>> - kill both servers
>> - modify your configuration files to set
>> "EventLogging" to "none"
>> - delete your old log files (or move them to another
>> directory)
>> - start the servers
>>
>> You can then send us the complete contents of both log
>> files
>> and we
>> can go from there. The "all" level is a little hard
>> to interpret
>> because it generates a lot of information even when
>> servers
>> are idle.
>>
>> Also, when you went from one server to two, did you
>> delete
>> your old
>> storage space (/pvfs) and start over, or are you trying
>> to
>> keep that
>> data and add servers to it?
>>
>> thanks!
>> -Phil
>>
>> Asterios Katsifodimos wrote:
>>
>> Hello all,
>>
>> I have been trying to install PVFS 2.8.1 on Ubuntu
>> server,
>> Centos4 and
>> Scientific Linux 4. I compile it and can run it on
>> a "single
>> host" configuration
>> without any problems.
>>
>> However, when I add more nodes to the
>> configuration(always using the
>> pvfs2-genconfig defaults ) I have the following
>> problem:
>>
>> *On the metadata node I get these messages:*
>> [E 04/02 20:16] batch_create request got: Invalid
>> argument
>> [E 04/02 20:16] batch_create request got: Invalid
>> argument
>> [E 04/02 20:16] batch_create request got: Invalid
>> argument
>> [E 04/02 20:16] batch_create request got: Invalid
>> argument
>>
>>
>> *In the IO nodes I get:*
>> [root at wn140 ~]# tail -50 /tmp/pvfs2-server.log
>> [D 04/02 23:53] BMI_testcontext completing:
>> 18446744072456767880
>> [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>> msgpairarray_sm:complete (status: 1)
>> [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37
>> index: 0
>> base-frm: 1
>> [D 04/02 23:53] msgpairarray_complete: sm 0x88f8b00
>> status_user_tag 1 msgarray_count 1
>> [D 04/02 23:53] msgpairarray: 1 operations remain
>> [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>> msgpairarray_sm:complete (error code:
>> -1073742006), (action:
>> DEFERRED)
>> [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>> msgpairarray_sm:complete (status: 0)
>> [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37
>> index: 0
>> base-frm: 1
>> [D 04/02 23:53] msgpairarray_complete: sm 0x88f8b00
>> status_user_tag 0 msgarray_count 1
>> [D 04/02 23:53] msgpairarray: all operations
>> complete
>> [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>> msgpairarray_sm:complete (error code: 190), (action:
>> COMPLETE)
>> [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>> msgpairarray_sm:completion_fn (status: 0)
>> [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37
>> index: 0
>> base-frm: 1
>> [D 04/02 23:53] (0x88f8b00) msgpairarray state:
>> completion_fn
>> [E 04/02 23:53] Warning: msgpair failed to
>> tcp://wn141:3334,
>> will retry: Connection refused
>> [D 04/02 23:53] *** msgpairarray_completion_fn:
>> msgpair 0
>> failed, retry 1
>> [D 04/02 23:53] *** msgpairarray_completion_fn:
>> msgpair
>> retrying
>> after delay.
>> [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>> msgpairarray_sm:completion_fn (error code: 191),
>> (action:
>> COMPLETE)
>> [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>> msgpairarray_sm:post_retry (status: 0)
>> [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37
>> index: 0
>> base-frm: 1
>> [D 04/02 23:53] msgpairarray_post_retry: sm
>> 0x88f8b00,
>> wait 2000 ms
>> [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>> msgpairarray_sm:post_retry (error code: 0), (action:
>> DEFERRED)
>> [D 04/02 23:53] [SM Entering]: (0x89476c0)
>> perf_update_sm:do_work (status: 0)
>> [P 04/02 23:53] Start times (hr:min:sec):
>> 23:53:11.330
>> 23:53:10.310 23:53:09.287 23:53:08.268
>> 23:53:07.245
>> 23:53:06.225
>> [P 04/02 23:53] Intervals (hr:min:sec) :
>> 00:00:01.026
>> 00:00:01.020 00:00:01.023 00:00:01.019
>> 00:00:01.023
>> 00:00:01.020
>> [P 04/02 23:53]
>>
>> -------------------------------------------------------------------------------------------------------------
>> [P 04/02 23:53] bytes read :
>> 0 0 0 0
>> 0 0
>> [P 04/02 23:53] bytes written :
>> 0 0 0 0
>> 0 0
>> [P 04/02 23:53] metadata reads :
>> 0 0 0 0
>> 0 0
>> [P 04/02 23:53] metadata writes :
>> 0 0 0 0
>> 0 0
>> [P 04/02 23:53] metadata dspace ops :
>> 0 0 0 0
>> 0 0
>> [P 04/02 23:53] metadata keyval ops :
>> 1 1 1 1
>> 1 1
>> [P 04/02 23:53] request scheduler :
>> 0 0 0 0
>> 0 0
>> [D 04/02 23:53] [SM Exiting]: (0x89476c0)
>> perf_update_sm:do_work
>> (error code: 0), (action: DEFERRED)
>> [D 04/02 23:53] [SM Entering]: (0x8948810)
>> job_timer_sm:do_work
>> (status: 0)
>> [D 04/02 23:53] [SM Exiting]: (0x8948810)
>> job_timer_sm:do_work
>> (error code: 0), (action: DEFERRED)
>> [D 04/02 23:53] [SM Entering]: (0x89476c0)
>> perf_update_sm:do_work (status: 0)
>> [P 04/02 23:53] Start times (hr:min:sec):
>> 23:53:12.356
>> 23:53:11.330 23:53:10.310 23:53:09.287
>> 23:53:08.268
>> 23:53:07.245
>> [P 04/02 23:53] Intervals (hr:min:sec) :
>> 00:00:01.020
>> 00:00:01.026 00:00:01.020 00:00:01.023
>> 00:00:01.019
>> 00:00:01.023
>> [P 04/02 23:53]
>>
>> -------------------------------------------------------------------------------------------------------------
>> [P 04/02 23:53] bytes read :
>> 0 0 0 0
>> 0 0
>> [P 04/02 23:53] bytes written :
>> 0 0 0 0
>> 0 0
>> [P 04/02 23:53] metadata reads :
>> 0 0 0 0
>> 0 0
>> [P 04/02 23:53] metadata writes :
>> 0 0 0 0
>> 0 0
>> [P 04/02 23:53] metadata dspace ops :
>> 0 0 0 0
>> 0 0
>> [P 04/02 23:53] metadata keyval ops :
>> 1 1 1 1
>> 1 1
>> [P 04/02 23:53] request scheduler :
>> 0 0 0 0
>> 0 0
>> [D 04/02 23:53] [SM Exiting]: (0x89476c0)
>> perf_update_sm:do_work
>> (error code: 0), (action: DEFERRED)
>> [D 04/02 23:53] [SM Entering]: (0x8948810)
>> job_timer_sm:do_work
>> (status: 0)
>> [D 04/02 23:53] [SM Exiting]: (0x8948810)
>> job_timer_sm:do_work
>> (error code: 0), (action: DEFERRED)
>>
>>
>> The metadata node keeps asking for something that
>> the IO
>> nodes
>> cannot give
>> the right way. So it complains. This makes the
>> nodes and the
>> metadata node
>> not to work.
>>
>> I have installed those services many times. I have
>> tested
>> this
>> using berkeley
>> db 4.2 and 4.3 on Redhat systems(centos, scientific
>> linnux) and
>> on Ubuntu server.
>>
>> I have also tried the PVFS version 2.6.3 and I get
>> the
>> same problem.
>>
>> *My config files look like:*
>> [root at wn140 ~]# more /etc/pvfs2-fs.conf
>> <Defaults>
>> UnexpectedRequests 50
>> EventLogging all
>> EnableTracing no
>> LogStamp datetime
>> BMIModules bmi_tcp
>> FlowModules flowproto_multiqueue
>> PerfUpdateInterval 1000
>> ServerJobBMITimeoutSecs 30
>> ServerJobFlowTimeoutSecs 30
>> ClientJobBMITimeoutSecs 300
>> ClientJobFlowTimeoutSecs 300
>> ClientRetryLimit 5
>> ClientRetryDelayMilliSecs 2000
>> PrecreateBatchSize 512
>> PrecreateLowThreshold 256
>>
>> StorageSpace /pvfs
>> LogFile /tmp/pvfs2-server.log
>> </Defaults>
>>
>> <Aliases>
>> Alias wn140 tcp://wn140:3334
>> Alias wn141 tcp://wn141:3334
>> </Aliases>
>>
>> <Filesystem>
>> Name pvfs2-fs
>> ID 320870944
>> RootHandle 1048576
>> FileStuffing yes
>> <MetaHandleRanges>
>> Range wn140 3-2305843009213693953
>> Range wn141
>> 2305843009213693954-4611686018427387904
>> </MetaHandleRanges>
>> <DataHandleRanges>
>> Range wn140
>> 4611686018427387905-6917529027641081855
>> Range wn141
>> 6917529027641081856-9223372036854775806
>> </DataHandleRanges>
>> <StorageHints>
>> TroveSyncMeta yes
>> TroveSyncData no
>> TroveMethod alt-aio
>> </StorageHints>
>> </Filesystem>
>>
>>
>> My setup is made from two nodes that are both IO
>> and Metadata
>> nodes. I have also tried
>> a 4 node setup with 2I/O - 2 MD nodes resulting in
>> the
>> same thing.
>>
>> Any suggestions?
>>
>> thank you in advance,
>> --
>> Asterios Katsifodimos
>> High Performance Computing systems Lab
>> Department of Computer Science, University of Cyprus
>> http://www.asteriosk.gr <http://www.asteriosk.gr/>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Pvfs2-users mailing list
>> Pvfs2-users at beowulf-underground.org
>> <mailto:Pvfs2-users at beowulf-underground.org>
>> <mailto:Pvfs2-users at beowulf-underground.org
>> <mailto:Pvfs2-users at beowulf-underground.org>>
>> <mailto:Pvfs2-users at beowulf-underground.org
>> <mailto:Pvfs2-users at beowulf-underground.org>
>> <mailto:Pvfs2-users at beowulf-underground.org
>> <mailto:Pvfs2-users at beowulf-underground.org>>>
>>
>>
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>>
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.beowulf-underground.org/pipermail/pvfs2-users/attachments/20090406/3a2be28f/attachment-0001.htm
More information about the Pvfs2-users
mailing list