[Pvfs2-developers] Re: noncontig-test
Kyle Schochenmaier
kschoche at scl.ameslab.gov
Thu Aug 2 16:22:44 EDT 2007
Scott Atchley wrote:
> Kyle,
>
> Thanks.
>
> I do not see any bmi_mx error messages or any bmi_mx messages at all.
> Did you change BMX_DEBUG to 1 and add BMX_DB_ALL to BMX_DB_MASK and
> then make and make install?
I changed them but didnt do a make clean.. redid that and got some other
output for you.
Attached this time is the correct output, heh.
Kyle
>
> Scott
>
> On Aug 2, 2007, at 3:54 PM, Kyle Schochenmaier wrote:
>
>> Scott Atchley wrote:
>>> Kyle,
>>>
>>> Are you using mpich-mx or mpich or mpich2? Are you using the bmi_mx
>>> code in PVFS cvs? I am not sure if mpich-mx supports non-contiguous
>>> data.
>> I'm using mpich2. mpich2-1.0.5p4, and CVS head.
>>>
>>> If you are using bmi_mx that is in your cvs, please try using the
>>> files I sent today (I have not had a chance to update my PVFS2 cvs
>>> and create a patch). Error 22 is EINVAL in Linux and I actually used
>>> that in some of my older code.
>> I just built with your changes and the changes that follow, and still
>> have the error. I'll attach the logfile here, I'm not sure if it
>> makes any more sense now then it did before :-/.
>>
>>
>> thanks,
>>
>> Kyle
>>>
>>> Also, can you run with PVFS2_DEBUGMASK=all? Can you edit
>>> $PVFS2/src/io/bmi/bmi_mx/mx.h so that BMX_DEBUG is 1 and change:
>>>
>>> #define BMX_DB_MASK (BMX_DB_ERR|BMX_DB_WARN)
>>>
>>> to
>>>
>>> #define BMX_DB_MASK (BMX_DB_ERR|BMX_DB_WARN|BMX_DB_ALL)
>>>
>>> There will be a lot of output but it may point out the issue.
>>>
>>> Scott
>>>
>>> On Aug 2, 2007, at 2:56 PM, Kyle Schochenmaier wrote:
>>>
>>>> Sam and I looked into a problem we found with the noncontig-test
>>>> that I'm using as one of my benchmarks in my suite.
>>>>
>>>> Test setup:
>>>> pvfs2-fs: MX on 4 data servers, 5th server is the client. (CVS Head)
>>>>
>>>> If I run the test using MX, it will fail, but with TCP, the test
>>>> completes, we had originally thought that this was a problem in the
>>>> pint-request code (as the log will indicate) but I'm wondering now
>>>> why it would fail using a different transport.. To clear up the
>>>> obvious problems, I've run other benchmarks using the same setup,
>>>> before and after this error shows up and those all run to
>>>> completion just fine on both mx and tcp.
>>>>
>>>> Any ideas where to start with this?
>>>>
>>>> thanks,
>>>> Kyle
>>>>
>>>> __Output__
>>>>
>>>> TCP:
>>>>
>>>> kschoche at bb18:~/framework/noncontig-test/noncontig$ mpirun -np 1
>>>> ./noncontig -fname pvfs2://tmp/pvfs2/blah -fsize 1 -timing
>>>> ========= Parameter space dump =========
>>>> filename: pvfs2://tmp/pvfs2/blah ionodes
>>>> file size (MB): 1 buffer size 0
>>>> vector length: 10 element count: 1 vector count: 0
>>>> striping factor: 0 striping size: -1 collective buffer size: 0
>>>> loops: 1 displacement 0
>>>> ========= Dump done =========
>>>> #* no verification possible!
>>>>
>>>> # testing noncontiguous in memory, noncontiguous in file using
>>>> independent I/O
>>>> # vector count = 26214 - access count = 26214
>>>> write bandwidth (min/max/acc [MB/s]) : 0.331 / 0.331 / 0.331
>>>> read bandwidth (min/max/acc [MB/s]) : 0.370 / 0.370 / 0.370
>>>> file size: 1024kB size per process: 1023kB
>>>>
>>>> # testing noncontiguous in memory, contiguous in file using
>>>> independent I/O
>>>> # vector count = 26214 - access count = 26214
>>>> write bandwidth (min/max/acc [MB/s]) : 0.692 / 0.692 / 0.692
>>>> read bandwidth (min/max/acc [MB/s]) : 0.766 / 0.766 / 0.766
>>>> file size: 1023kB size per process: 1023kB
>>>>
>>>> # testing contiguous in memory, noncontiguous in file using
>>>> independent I/O
>>>> # vector count = 26214 - access count = 26214
>>>> write bandwidth (min/max/acc [MB/s]) : 0.348 / 0.348 / 0.348
>>>> read bandwidth (min/max/acc [MB/s]) : 0.392 / 0.392 / 0.392
>>>> file size: 1024kB size per process: 1023kB
>>>>
>>>>
>>>> MX:
>>>> kschoche at bb18:~/framework/noncontig-test/noncontig$ `mpirun -np 1
>>>> ./noncontig -fname pvfs2://tmp/pvfs2/blah -fsize 1 &> mx_output`
>>>>
>>>>
>>>> ========= Parameter space dump =========
>>>> filename: pvfs2://tmp/pvfs2/blah ionodes
>>>> file size (MB): 1 buffer size 0
>>>> vector length: 10 element count: 1 vector count: 0
>>>> striping factor: 0 striping size: -1 collective buffer size: 0
>>>> loops: 1 displacement 0
>>>> ========= Dump done =========
>>>> #* no verification possible!
>>>>
>>>> # testing noncontiguous in memory, noncontiguous in file using
>>>> independent I/O
>>>> # vector count = 26214 - access count = 26214
>>>> [E 13:39:06.029976] src/io/description/pint-request.c line 95:
>>>> PINT_process_requ
>>>> est: no segments or bytes requested!
>>>> [E 13:39:06.030497] [bt] ./noncontig [0x4cd655]
>>>> [E 13:39:06.030555] [bt] ./noncontig [0x4b2e01]
>>>> [E 13:39:06.030608] [bt] ./noncontig [0x4ae8f1]
>>>> [E 13:39:06.030658] [bt] ./noncontig [0x507b62]
>>>> [E 13:39:06.030707] [bt] ./noncontig [0x5080dd]
>>>> [E 13:39:06.030756] [bt] ./noncontig [0x507e2f]
>>>> [E 13:39:06.030806] [bt] ./noncontig [0x4a5030]
>>>> [E 13:39:06.030854] [bt] ./noncontig [0x4ae202]
>>>> [E 13:39:06.030903] [bt] ./noncontig [0x4ae2d5]
>>>> [E 13:39:06.030952] [bt] ./noncontig [0x479ab0]
>>>> [E 13:39:06.031001] [bt] ./noncontig [0x41df43]
>>>> [E 13:39:06.031072] PVFS_isys_io call: Invalid argument
>>>> [0] Error -524286 in MPI_File_write
>>>> Undefined dynamic error code
>>>> [E 13:39:06.067249] Warning: non PVFS2 error code (22):
>>>> [E 13:39:06.067468] Send immediately failed: Invalid argument
>>>> [E 13:39:06.067525] Send error: cancelling recv.
>>>> [E 13:39:06.067599] Warning: non PVFS2 error code (22):
>>>> [E 13:39:06.067651] msgpair failed, will retry: Invalid argument
>>>> [E 13:39:06.067706] *** msgpairarray_completion_fn: msgpair to
>>>> server mx://bb15:
>>>> 0:3 failed: Invalid argument
>>>> [E 13:39:06.067755] *** Non-BMI failure.
>>>> [E 13:39:06.074742] Warning: non PVFS2 error code (22):
>>>> [E 13:39:06.074795] Send immediately failed: Invalid argument
>>>> [E 13:39:06.074843] Send error: cancelling recv.
>>>> [E 13:39:06.074900] Warning: non PVFS2 error code (22):
>>>> [E 13:39:06.074948] msgpair failed, will retry: Invalid argument
>>>> [E 13:39:06.074998] *** msgpairarray_completion_fn: msgpair to
>>>> server mx://bb15:
>>>> 0:3 failed: Invalid argument
>>>> [E 13:39:06.075046] *** Non-BMI failure.
>>>> [E 13:39:06.075396] Warning: non PVFS2 error code (22):
>>>> [E 13:39:06.075447] Send immediately failed: Invalid argument
>>>> [E 13:39:06.075493] Send error: cancelling recv.
>>>> [E 13:39:06.075551] Warning: non PVFS2 error code (22):
>>>> [E 13:39:06.075599] msgpair failed, will retry: Invalid argument
>>>> [E 13:39:06.075649] *** msgpairarray_completion_fn: msgpair to
>>>> server mx://bb15:
>>>>
>>>>
>>>
>>
>> <mx.output.bz2>
>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
> !DSPAM:46b23a53104412063918936!
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nc.out2.bz2
Type: application/x-bzip
Size: 85020 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20070802/27542fed/nc.out2-0001.bin
More information about the Pvfs2-developers
mailing list