[Pvfs2-developers] Re: noncontig-test
Scott Atchley
atchley at myri.com
Thu Aug 2 16:09:02 EDT 2007
Kyle,
Thanks.
I do not see any bmi_mx error messages or any bmi_mx messages at all.
Did you change BMX_DEBUG to 1 and add BMX_DB_ALL to BMX_DB_MASK and
then make and make install?
Scott
On Aug 2, 2007, at 3:54 PM, Kyle Schochenmaier wrote:
> Scott Atchley wrote:
>> Kyle,
>>
>> Are you using mpich-mx or mpich or mpich2? Are you using the
>> bmi_mx code in PVFS cvs? I am not sure if mpich-mx supports non-
>> contiguous data.
> I'm using mpich2. mpich2-1.0.5p4, and CVS head.
>>
>> If you are using bmi_mx that is in your cvs, please try using the
>> files I sent today (I have not had a chance to update my PVFS2 cvs
>> and create a patch). Error 22 is EINVAL in Linux and I actually
>> used that in some of my older code.
> I just built with your changes and the changes that follow, and
> still have the error. I'll attach the logfile here, I'm not sure
> if it makes any more sense now then it did before :-/.
>
>
> thanks,
>
> Kyle
>>
>> Also, can you run with PVFS2_DEBUGMASK=all? Can you edit $PVFS2/
>> src/io/bmi/bmi_mx/mx.h so that BMX_DEBUG is 1 and change:
>>
>> #define BMX_DB_MASK (BMX_DB_ERR|BMX_DB_WARN)
>>
>> to
>>
>> #define BMX_DB_MASK (BMX_DB_ERR|BMX_DB_WARN|BMX_DB_ALL)
>>
>> There will be a lot of output but it may point out the issue.
>>
>> Scott
>>
>> On Aug 2, 2007, at 2:56 PM, Kyle Schochenmaier wrote:
>>
>>> Sam and I looked into a problem we found with the noncontig-test
>>> that I'm using as one of my benchmarks in my suite.
>>>
>>> Test setup:
>>> pvfs2-fs: MX on 4 data servers, 5th server is the client. (CVS Head)
>>>
>>> If I run the test using MX, it will fail, but with TCP, the test
>>> completes, we had originally thought that this was a problem in
>>> the pint-request code (as the log will indicate) but I'm
>>> wondering now why it would fail using a different transport.. To
>>> clear up the obvious problems, I've run other benchmarks using
>>> the same setup, before and after this error shows up and those
>>> all run to completion just fine on both mx and tcp.
>>>
>>> Any ideas where to start with this?
>>>
>>> thanks,
>>> Kyle
>>>
>>> __Output__
>>>
>>> TCP:
>>>
>>> kschoche at bb18:~/framework/noncontig-test/noncontig$ mpirun -np
>>> 1 ./noncontig -fname pvfs2://tmp/pvfs2/blah -fsize 1 -timing
>>> ========= Parameter space dump =========
>>> filename: pvfs2://tmp/pvfs2/blah ionodes
>>> file size (MB): 1 buffer size 0
>>> vector length: 10 element count: 1 vector count: 0
>>> striping factor: 0 striping size: -1 collective buffer size: 0
>>> loops: 1 displacement 0
>>> ========= Dump done =========
>>> #* no verification possible!
>>>
>>> # testing noncontiguous in memory, noncontiguous in file using
>>> independent I/O
>>> # vector count = 26214 - access count = 26214
>>> write bandwidth (min/max/acc [MB/s]) : 0.331 / 0.331 / 0.331
>>> read bandwidth (min/max/acc [MB/s]) : 0.370 / 0.370 / 0.370
>>> file size: 1024kB size per process: 1023kB
>>>
>>> # testing noncontiguous in memory, contiguous in file using
>>> independent I/O
>>> # vector count = 26214 - access count = 26214
>>> write bandwidth (min/max/acc [MB/s]) : 0.692 / 0.692 / 0.692
>>> read bandwidth (min/max/acc [MB/s]) : 0.766 / 0.766 / 0.766
>>> file size: 1023kB size per process: 1023kB
>>>
>>> # testing contiguous in memory, noncontiguous in file using
>>> independent I/O
>>> # vector count = 26214 - access count = 26214
>>> write bandwidth (min/max/acc [MB/s]) : 0.348 / 0.348 / 0.348
>>> read bandwidth (min/max/acc [MB/s]) : 0.392 / 0.392 / 0.392
>>> file size: 1024kB size per process: 1023kB
>>>
>>>
>>> MX:
>>> kschoche at bb18:~/framework/noncontig-test/noncontig$ `mpirun -np
>>> 1 ./noncontig -fname pvfs2://tmp/pvfs2/blah -fsize 1 &> mx_output`
>>>
>>>
>>> ========= Parameter space dump =========
>>> filename: pvfs2://tmp/pvfs2/blah ionodes
>>> file size (MB): 1 buffer size 0
>>> vector length: 10 element count: 1 vector count: 0
>>> striping factor: 0 striping size: -1 collective buffer size: 0
>>> loops: 1 displacement 0
>>> ========= Dump done =========
>>> #* no verification possible!
>>>
>>> # testing noncontiguous in memory, noncontiguous in file using
>>> independent I/O
>>> # vector count = 26214 - access count = 26214
>>> [E 13:39:06.029976] src/io/description/pint-request.c line 95:
>>> PINT_process_requ
>>> est: no segments or bytes requested!
>>> [E 13:39:06.030497] [bt] ./noncontig [0x4cd655]
>>> [E 13:39:06.030555] [bt] ./noncontig [0x4b2e01]
>>> [E 13:39:06.030608] [bt] ./noncontig [0x4ae8f1]
>>> [E 13:39:06.030658] [bt] ./noncontig [0x507b62]
>>> [E 13:39:06.030707] [bt] ./noncontig [0x5080dd]
>>> [E 13:39:06.030756] [bt] ./noncontig [0x507e2f]
>>> [E 13:39:06.030806] [bt] ./noncontig [0x4a5030]
>>> [E 13:39:06.030854] [bt] ./noncontig [0x4ae202]
>>> [E 13:39:06.030903] [bt] ./noncontig [0x4ae2d5]
>>> [E 13:39:06.030952] [bt] ./noncontig [0x479ab0]
>>> [E 13:39:06.031001] [bt] ./noncontig [0x41df43]
>>> [E 13:39:06.031072] PVFS_isys_io call: Invalid argument
>>> [0] Error -524286 in MPI_File_write
>>> Undefined dynamic error code
>>> [E 13:39:06.067249] Warning: non PVFS2 error code (22):
>>> [E 13:39:06.067468] Send immediately failed: Invalid argument
>>> [E 13:39:06.067525] Send error: cancelling recv.
>>> [E 13:39:06.067599] Warning: non PVFS2 error code (22):
>>> [E 13:39:06.067651] msgpair failed, will retry: Invalid argument
>>> [E 13:39:06.067706] *** msgpairarray_completion_fn: msgpair to
>>> server mx://bb15:
>>> 0:3 failed: Invalid argument
>>> [E 13:39:06.067755] *** Non-BMI failure.
>>> [E 13:39:06.074742] Warning: non PVFS2 error code (22):
>>> [E 13:39:06.074795] Send immediately failed: Invalid argument
>>> [E 13:39:06.074843] Send error: cancelling recv.
>>> [E 13:39:06.074900] Warning: non PVFS2 error code (22):
>>> [E 13:39:06.074948] msgpair failed, will retry: Invalid argument
>>> [E 13:39:06.074998] *** msgpairarray_completion_fn: msgpair to
>>> server mx://bb15:
>>> 0:3 failed: Invalid argument
>>> [E 13:39:06.075046] *** Non-BMI failure.
>>> [E 13:39:06.075396] Warning: non PVFS2 error code (22):
>>> [E 13:39:06.075447] Send immediately failed: Invalid argument
>>> [E 13:39:06.075493] Send error: cancelling recv.
>>> [E 13:39:06.075551] Warning: non PVFS2 error code (22):
>>> [E 13:39:06.075599] msgpair failed, will retry: Invalid argument
>>> [E 13:39:06.075649] *** msgpairarray_completion_fn: msgpair to
>>> server mx://bb15:
>>>
>>>
>>
>
> <mx.output.bz2>
More information about the Pvfs2-developers
mailing list