[Pvfs2-developers] Re: noncontig-test

Scott Atchley atchley at myri.com
Thu Aug 2 16:36:03 EDT 2007


Kyle,

I still do not see any bmi_mx messages. There should be hundreds. ;-)

Is the app statically linked? If so, did you recompile it?

Scott

On Aug 2, 2007, at 4:22 PM, Kyle Schochenmaier wrote:

> Scott Atchley wrote:
>> Kyle,
>>
>> Thanks.
>>
>> I do not see any bmi_mx error messages or any bmi_mx messages at  
>> all. Did you change BMX_DEBUG to 1 and add BMX_DB_ALL to  
>> BMX_DB_MASK and then make and make install?
>
> I changed them but didnt do a make clean.. redid that and got some  
> other output for you.
> Attached this time is the correct output, heh.
>
> Kyle
>>
>> Scott
>>
>> On Aug 2, 2007, at 3:54 PM, Kyle Schochenmaier wrote:
>>
>>> Scott Atchley wrote:
>>>> Kyle,
>>>>
>>>> Are you using mpich-mx or mpich or mpich2? Are you using the  
>>>> bmi_mx code in PVFS cvs? I am not sure if mpich-mx supports non- 
>>>> contiguous data.
>>> I'm using mpich2.  mpich2-1.0.5p4, and CVS head.
>>>>
>>>> If you are using bmi_mx that is in your cvs, please try using  
>>>> the files I sent today (I have not had a chance to update my  
>>>> PVFS2 cvs and create a patch). Error 22 is EINVAL in Linux and I  
>>>> actually used that in some of my older code.
>>> I just built with your changes and the changes that follow, and  
>>> still have the error.  I'll attach the logfile here, I'm not sure  
>>> if it makes any more sense now then it did before :-/.
>>>
>>>
>>> thanks,
>>>
>>> Kyle
>>>>
>>>> Also, can you run with PVFS2_DEBUGMASK=all? Can you edit $PVFS2/ 
>>>> src/io/bmi/bmi_mx/mx.h so that BMX_DEBUG is 1 and change:
>>>>
>>>> #define BMX_DB_MASK (BMX_DB_ERR|BMX_DB_WARN)
>>>>
>>>> to
>>>>
>>>> #define BMX_DB_MASK (BMX_DB_ERR|BMX_DB_WARN|BMX_DB_ALL)
>>>>
>>>> There will be a lot of output but it may point out the issue.
>>>>
>>>> Scott
>>>>
>>>> On Aug 2, 2007, at 2:56 PM, Kyle Schochenmaier wrote:
>>>>
>>>>> Sam and I looked into a problem we found with the noncontig- 
>>>>> test that I'm using as one of my benchmarks in my suite.
>>>>>
>>>>> Test setup:
>>>>> pvfs2-fs: MX on 4 data servers, 5th server is the client. (CVS  
>>>>> Head)
>>>>>
>>>>> If I run the test using MX, it will fail, but with TCP, the  
>>>>> test completes, we had originally thought that this was a  
>>>>> problem in the pint-request code (as the log will indicate) but  
>>>>> I'm wondering now why it would fail using a different  
>>>>> transport..  To clear up the obvious problems, I've run other  
>>>>> benchmarks using the same setup, before and after this error  
>>>>> shows up and those all run to completion just fine on both mx  
>>>>> and tcp.
>>>>>
>>>>> Any ideas where to start with this?
>>>>>
>>>>> thanks,
>>>>> Kyle
>>>>>
>>>>> __Output__
>>>>>
>>>>> TCP:
>>>>>
>>>>> kschoche at bb18:~/framework/noncontig-test/noncontig$ mpirun -np  
>>>>> 1 ./noncontig -fname pvfs2://tmp/pvfs2/blah -fsize 1 -timing
>>>>> ========= Parameter space dump =========
>>>>> filename: pvfs2://tmp/pvfs2/blah  ionodes
>>>>> file size (MB): 1 buffer size 0
>>>>> vector length: 10 element count: 1 vector count: 0
>>>>> striping factor: 0 striping size: -1 collective buffer size: 0
>>>>> loops: 1 displacement 0
>>>>> ========= Dump done            =========
>>>>> #* no verification possible!
>>>>>
>>>>> # testing noncontiguous in memory, noncontiguous in file using  
>>>>> independent I/O
>>>>> # vector count = 26214 - access count = 26214
>>>>> write bandwidth (min/max/acc [MB/s]) :  0.331 /  0.331 /  0.331
>>>>> read  bandwidth (min/max/acc [MB/s]) :  0.370 /  0.370 /  0.370
>>>>> file size: 1024kB  size per process: 1023kB
>>>>>
>>>>> # testing noncontiguous in memory, contiguous in file using  
>>>>> independent I/O
>>>>> # vector count = 26214 - access count = 26214
>>>>> write bandwidth (min/max/acc [MB/s]) :  0.692 /  0.692 /  0.692
>>>>> read  bandwidth (min/max/acc [MB/s]) :  0.766 /  0.766 /  0.766
>>>>> file size: 1023kB  size per process: 1023kB
>>>>>
>>>>> # testing contiguous in memory, noncontiguous in file using  
>>>>> independent I/O
>>>>> # vector count = 26214 - access count = 26214
>>>>> write bandwidth (min/max/acc [MB/s]) :  0.348 /  0.348 /  0.348
>>>>> read  bandwidth (min/max/acc [MB/s]) :  0.392 /  0.392 /  0.392
>>>>> file size: 1024kB  size per process: 1023kB
>>>>>
>>>>>
>>>>> MX:
>>>>> kschoche at bb18:~/framework/noncontig-test/noncontig$ `mpirun -np  
>>>>> 1 ./noncontig -fname pvfs2://tmp/pvfs2/blah -fsize 1 &> mx_output`
>>>>>
>>>>>
>>>>> ========= Parameter space dump =========
>>>>> filename: pvfs2://tmp/pvfs2/blah  ionodes
>>>>> file size (MB): 1 buffer size 0
>>>>> vector length: 10 element count: 1 vector count: 0
>>>>> striping factor: 0 striping size: -1 collective buffer size: 0
>>>>> loops: 1 displacement 0
>>>>> ========= Dump done            =========
>>>>> #* no verification possible!
>>>>>
>>>>> # testing noncontiguous in memory, noncontiguous in file using  
>>>>> independent I/O
>>>>> # vector count = 26214 - access count = 26214
>>>>> [E 13:39:06.029976] src/io/description/pint-request.c line 95:  
>>>>> PINT_process_requ
>>>>> est: no segments or bytes requested!
>>>>> [E 13:39:06.030497]     [bt] ./noncontig [0x4cd655]
>>>>> [E 13:39:06.030555]     [bt] ./noncontig [0x4b2e01]
>>>>> [E 13:39:06.030608]     [bt] ./noncontig [0x4ae8f1]
>>>>> [E 13:39:06.030658]     [bt] ./noncontig [0x507b62]
>>>>> [E 13:39:06.030707]     [bt] ./noncontig [0x5080dd]
>>>>> [E 13:39:06.030756]     [bt] ./noncontig [0x507e2f]
>>>>> [E 13:39:06.030806]     [bt] ./noncontig [0x4a5030]
>>>>> [E 13:39:06.030854]     [bt] ./noncontig [0x4ae202]
>>>>> [E 13:39:06.030903]     [bt] ./noncontig [0x4ae2d5]
>>>>> [E 13:39:06.030952]     [bt] ./noncontig [0x479ab0]
>>>>> [E 13:39:06.031001]     [bt] ./noncontig [0x41df43]
>>>>> [E 13:39:06.031072] PVFS_isys_io call: Invalid argument
>>>>> [0] Error -524286 in MPI_File_write
>>>>> Undefined dynamic error code
>>>>> [E 13:39:06.067249] Warning: non PVFS2 error code (22):
>>>>> [E 13:39:06.067468] Send immediately failed: Invalid argument
>>>>> [E 13:39:06.067525] Send error: cancelling recv.
>>>>> [E 13:39:06.067599] Warning: non PVFS2 error code (22):
>>>>> [E 13:39:06.067651] msgpair failed, will retry: Invalid argument
>>>>> [E 13:39:06.067706] *** msgpairarray_completion_fn: msgpair to  
>>>>> server mx://bb15:
>>>>> 0:3 failed: Invalid argument
>>>>> [E 13:39:06.067755] *** Non-BMI failure.
>>>>> [E 13:39:06.074742] Warning: non PVFS2 error code (22):
>>>>> [E 13:39:06.074795] Send immediately failed: Invalid argument
>>>>> [E 13:39:06.074843] Send error: cancelling recv.
>>>>> [E 13:39:06.074900] Warning: non PVFS2 error code (22):
>>>>> [E 13:39:06.074948] msgpair failed, will retry: Invalid argument
>>>>> [E 13:39:06.074998] *** msgpairarray_completion_fn: msgpair to  
>>>>> server mx://bb15:
>>>>> 0:3 failed: Invalid argument
>>>>> [E 13:39:06.075046] *** Non-BMI failure.
>>>>> [E 13:39:06.075396] Warning: non PVFS2 error code (22):
>>>>> [E 13:39:06.075447] Send immediately failed: Invalid argument
>>>>> [E 13:39:06.075493] Send error: cancelling recv.
>>>>> [E 13:39:06.075551] Warning: non PVFS2 error code (22):
>>>>> [E 13:39:06.075599] msgpair failed, will retry: Invalid argument
>>>>> [E 13:39:06.075649] *** msgpairarray_completion_fn: msgpair to  
>>>>> server mx://bb15:
>>>>>
>>>>>
>>>>
>>>
>>> <mx.output.bz2>
>>
>> _______________________________________________
>> Pvfs2-developers mailing list
>> Pvfs2-developers at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
>> !DSPAM:46b23a53104412063918936!
>>
>
> <nc.out2.bz2>



More information about the Pvfs2-developers mailing list