[Pvfs2-developers] Re: noncontig-test

Kyle Schochenmaier kschoche at scl.ameslab.gov
Thu Aug 2 16:22:44 EDT 2007


Scott Atchley wrote:
> Kyle,
>
> Thanks.
>
> I do not see any bmi_mx error messages or any bmi_mx messages at all. 
> Did you change BMX_DEBUG to 1 and add BMX_DB_ALL to BMX_DB_MASK and 
> then make and make install?

I changed them but didnt do a make clean.. redid that and got some other 
output for you.
Attached this time is the correct output, heh.

Kyle
>
> Scott
>
> On Aug 2, 2007, at 3:54 PM, Kyle Schochenmaier wrote:
>
>> Scott Atchley wrote:
>>> Kyle,
>>>
>>> Are you using mpich-mx or mpich or mpich2? Are you using the bmi_mx 
>>> code in PVFS cvs? I am not sure if mpich-mx supports non-contiguous 
>>> data.
>> I'm using mpich2.  mpich2-1.0.5p4, and CVS head.
>>>
>>> If you are using bmi_mx that is in your cvs, please try using the 
>>> files I sent today (I have not had a chance to update my PVFS2 cvs 
>>> and create a patch). Error 22 is EINVAL in Linux and I actually used 
>>> that in some of my older code.
>> I just built with your changes and the changes that follow, and still 
>> have the error.  I'll attach the logfile here, I'm not sure if it 
>> makes any more sense now then it did before :-/.
>>
>>
>> thanks,
>>
>> Kyle
>>>
>>> Also, can you run with PVFS2_DEBUGMASK=all? Can you edit 
>>> $PVFS2/src/io/bmi/bmi_mx/mx.h so that BMX_DEBUG is 1 and change:
>>>
>>> #define BMX_DB_MASK (BMX_DB_ERR|BMX_DB_WARN)
>>>
>>> to
>>>
>>> #define BMX_DB_MASK (BMX_DB_ERR|BMX_DB_WARN|BMX_DB_ALL)
>>>
>>> There will be a lot of output but it may point out the issue.
>>>
>>> Scott
>>>
>>> On Aug 2, 2007, at 2:56 PM, Kyle Schochenmaier wrote:
>>>
>>>> Sam and I looked into a problem we found with the noncontig-test 
>>>> that I'm using as one of my benchmarks in my suite.
>>>>
>>>> Test setup:
>>>> pvfs2-fs: MX on 4 data servers, 5th server is the client. (CVS Head)
>>>>
>>>> If I run the test using MX, it will fail, but with TCP, the test 
>>>> completes, we had originally thought that this was a problem in the 
>>>> pint-request code (as the log will indicate) but I'm wondering now 
>>>> why it would fail using a different transport..  To clear up the 
>>>> obvious problems, I've run other benchmarks using the same setup, 
>>>> before and after this error shows up and those all run to 
>>>> completion just fine on both mx and tcp.
>>>>
>>>> Any ideas where to start with this?
>>>>
>>>> thanks,
>>>> Kyle
>>>>
>>>> __Output__
>>>>
>>>> TCP:
>>>>
>>>> kschoche at bb18:~/framework/noncontig-test/noncontig$ mpirun -np 1 
>>>> ./noncontig -fname pvfs2://tmp/pvfs2/blah -fsize 1 -timing
>>>> ========= Parameter space dump =========
>>>> filename: pvfs2://tmp/pvfs2/blah  ionodes
>>>> file size (MB): 1 buffer size 0
>>>> vector length: 10 element count: 1 vector count: 0
>>>> striping factor: 0 striping size: -1 collective buffer size: 0
>>>> loops: 1 displacement 0
>>>> ========= Dump done            =========
>>>> #* no verification possible!
>>>>
>>>> # testing noncontiguous in memory, noncontiguous in file using 
>>>> independent I/O
>>>> # vector count = 26214 - access count = 26214
>>>> write bandwidth (min/max/acc [MB/s]) :  0.331 /  0.331 /  0.331
>>>> read  bandwidth (min/max/acc [MB/s]) :  0.370 /  0.370 /  0.370
>>>> file size: 1024kB  size per process: 1023kB
>>>>
>>>> # testing noncontiguous in memory, contiguous in file using 
>>>> independent I/O
>>>> # vector count = 26214 - access count = 26214
>>>> write bandwidth (min/max/acc [MB/s]) :  0.692 /  0.692 /  0.692
>>>> read  bandwidth (min/max/acc [MB/s]) :  0.766 /  0.766 /  0.766
>>>> file size: 1023kB  size per process: 1023kB
>>>>
>>>> # testing contiguous in memory, noncontiguous in file using 
>>>> independent I/O
>>>> # vector count = 26214 - access count = 26214
>>>> write bandwidth (min/max/acc [MB/s]) :  0.348 /  0.348 /  0.348
>>>> read  bandwidth (min/max/acc [MB/s]) :  0.392 /  0.392 /  0.392
>>>> file size: 1024kB  size per process: 1023kB
>>>>
>>>>
>>>> MX:
>>>> kschoche at bb18:~/framework/noncontig-test/noncontig$ `mpirun -np 1 
>>>> ./noncontig -fname pvfs2://tmp/pvfs2/blah -fsize 1 &> mx_output`
>>>>
>>>>
>>>> ========= Parameter space dump =========
>>>> filename: pvfs2://tmp/pvfs2/blah  ionodes
>>>> file size (MB): 1 buffer size 0
>>>> vector length: 10 element count: 1 vector count: 0
>>>> striping factor: 0 striping size: -1 collective buffer size: 0
>>>> loops: 1 displacement 0
>>>> ========= Dump done            =========
>>>> #* no verification possible!
>>>>
>>>> # testing noncontiguous in memory, noncontiguous in file using 
>>>> independent I/O
>>>> # vector count = 26214 - access count = 26214
>>>> [E 13:39:06.029976] src/io/description/pint-request.c line 95: 
>>>> PINT_process_requ
>>>> est: no segments or bytes requested!
>>>> [E 13:39:06.030497]     [bt] ./noncontig [0x4cd655]
>>>> [E 13:39:06.030555]     [bt] ./noncontig [0x4b2e01]
>>>> [E 13:39:06.030608]     [bt] ./noncontig [0x4ae8f1]
>>>> [E 13:39:06.030658]     [bt] ./noncontig [0x507b62]
>>>> [E 13:39:06.030707]     [bt] ./noncontig [0x5080dd]
>>>> [E 13:39:06.030756]     [bt] ./noncontig [0x507e2f]
>>>> [E 13:39:06.030806]     [bt] ./noncontig [0x4a5030]
>>>> [E 13:39:06.030854]     [bt] ./noncontig [0x4ae202]
>>>> [E 13:39:06.030903]     [bt] ./noncontig [0x4ae2d5]
>>>> [E 13:39:06.030952]     [bt] ./noncontig [0x479ab0]
>>>> [E 13:39:06.031001]     [bt] ./noncontig [0x41df43]
>>>> [E 13:39:06.031072] PVFS_isys_io call: Invalid argument
>>>> [0] Error -524286 in MPI_File_write
>>>> Undefined dynamic error code
>>>> [E 13:39:06.067249] Warning: non PVFS2 error code (22):
>>>> [E 13:39:06.067468] Send immediately failed: Invalid argument
>>>> [E 13:39:06.067525] Send error: cancelling recv.
>>>> [E 13:39:06.067599] Warning: non PVFS2 error code (22):
>>>> [E 13:39:06.067651] msgpair failed, will retry: Invalid argument
>>>> [E 13:39:06.067706] *** msgpairarray_completion_fn: msgpair to 
>>>> server mx://bb15:
>>>> 0:3 failed: Invalid argument
>>>> [E 13:39:06.067755] *** Non-BMI failure.
>>>> [E 13:39:06.074742] Warning: non PVFS2 error code (22):
>>>> [E 13:39:06.074795] Send immediately failed: Invalid argument
>>>> [E 13:39:06.074843] Send error: cancelling recv.
>>>> [E 13:39:06.074900] Warning: non PVFS2 error code (22):
>>>> [E 13:39:06.074948] msgpair failed, will retry: Invalid argument
>>>> [E 13:39:06.074998] *** msgpairarray_completion_fn: msgpair to 
>>>> server mx://bb15:
>>>> 0:3 failed: Invalid argument
>>>> [E 13:39:06.075046] *** Non-BMI failure.
>>>> [E 13:39:06.075396] Warning: non PVFS2 error code (22):
>>>> [E 13:39:06.075447] Send immediately failed: Invalid argument
>>>> [E 13:39:06.075493] Send error: cancelling recv.
>>>> [E 13:39:06.075551] Warning: non PVFS2 error code (22):
>>>> [E 13:39:06.075599] msgpair failed, will retry: Invalid argument
>>>> [E 13:39:06.075649] *** msgpairarray_completion_fn: msgpair to 
>>>> server mx://bb15:
>>>>
>>>>
>>>
>>
>> <mx.output.bz2>
>
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
> !DSPAM:46b23a53104412063918936!
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: nc.out2.bz2
Type: application/x-bzip
Size: 85020 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20070802/27542fed/nc.out2-0001.bin


More information about the Pvfs2-developers mailing list