[Pvfs2-developers] Re: noncontig-test

Scott Atchley atchley at myri.com
Thu Aug 2 15:17:04 EDT 2007


Kyle,

Are you using mpich-mx or mpich or mpich2? Are you using the bmi_mx  
code in PVFS cvs? I am not sure if mpich-mx supports non-contiguous  
data.

If you are using bmi_mx that is in your cvs, please try using the  
files I sent today (I have not had a chance to update my PVFS2 cvs  
and create a patch). Error 22 is EINVAL in Linux and I actually used  
that in some of my older code.

Also, can you run with PVFS2_DEBUGMASK=all? Can you edit $PVFS2/src/ 
io/bmi/bmi_mx/mx.h so that BMX_DEBUG is 1 and change:

#define BMX_DB_MASK (BMX_DB_ERR|BMX_DB_WARN)

to

#define BMX_DB_MASK (BMX_DB_ERR|BMX_DB_WARN|BMX_DB_ALL)

There will be a lot of output but it may point out the issue.

Scott

On Aug 2, 2007, at 2:56 PM, Kyle Schochenmaier wrote:

> Sam and I looked into a problem we found with the noncontig-test  
> that I'm using as one of my benchmarks in my suite.
>
> Test setup:
> pvfs2-fs: MX on 4 data servers, 5th server is the client. (CVS Head)
>
> If I run the test using MX, it will fail, but with TCP, the test  
> completes, we had originally thought that this was a problem in the  
> pint-request code (as the log will indicate) but I'm wondering now  
> why it would fail using a different transport..  To clear up the  
> obvious problems, I've run other benchmarks using the same setup,  
> before and after this error shows up and those all run to  
> completion just fine on both mx and tcp.
>
> Any ideas where to start with this?
>
> thanks,
> Kyle
>
> __Output__
>
> TCP:
>
> kschoche at bb18:~/framework/noncontig-test/noncontig$ mpirun -np 1 ./ 
> noncontig -fname pvfs2://tmp/pvfs2/blah -fsize 1 -timing
> ========= Parameter space dump =========
> filename: pvfs2://tmp/pvfs2/blah  ionodes
> file size (MB): 1 buffer size 0
> vector length: 10 element count: 1 vector count: 0
> striping factor: 0 striping size: -1 collective buffer size: 0
> loops: 1 displacement 0
> ========= Dump done            =========
> #* no verification possible!
>
> # testing noncontiguous in memory, noncontiguous in file using  
> independent I/O
> # vector count = 26214 - access count = 26214
> write bandwidth (min/max/acc [MB/s]) :  0.331 /  0.331 /  0.331
> read  bandwidth (min/max/acc [MB/s]) :  0.370 /  0.370 /  0.370
> file size: 1024kB  size per process: 1023kB
>
> # testing noncontiguous in memory, contiguous in file using  
> independent I/O
> # vector count = 26214 - access count = 26214
> write bandwidth (min/max/acc [MB/s]) :  0.692 /  0.692 /  0.692
> read  bandwidth (min/max/acc [MB/s]) :  0.766 /  0.766 /  0.766
> file size: 1023kB  size per process: 1023kB
>
> # testing contiguous in memory, noncontiguous in file using  
> independent I/O
> # vector count = 26214 - access count = 26214
> write bandwidth (min/max/acc [MB/s]) :  0.348 /  0.348 /  0.348
> read  bandwidth (min/max/acc [MB/s]) :  0.392 /  0.392 /  0.392
> file size: 1024kB  size per process: 1023kB
>
>
> MX:
> kschoche at bb18:~/framework/noncontig-test/noncontig$ `mpirun -np 1 ./ 
> noncontig -fname pvfs2://tmp/pvfs2/blah -fsize 1 &> mx_output`
>
>
> ========= Parameter space dump =========
> filename: pvfs2://tmp/pvfs2/blah  ionodes
> file size (MB): 1 buffer size 0
> vector length: 10 element count: 1 vector count: 0
> striping factor: 0 striping size: -1 collective buffer size: 0
> loops: 1 displacement 0
> ========= Dump done            =========
> #* no verification possible!
>
> # testing noncontiguous in memory, noncontiguous in file using  
> independent I/O
> # vector count = 26214 - access count = 26214
> [E 13:39:06.029976] src/io/description/pint-request.c line 95:  
> PINT_process_requ
> est: no segments or bytes requested!
> [E 13:39:06.030497]     [bt] ./noncontig [0x4cd655]
> [E 13:39:06.030555]     [bt] ./noncontig [0x4b2e01]
> [E 13:39:06.030608]     [bt] ./noncontig [0x4ae8f1]
> [E 13:39:06.030658]     [bt] ./noncontig [0x507b62]
> [E 13:39:06.030707]     [bt] ./noncontig [0x5080dd]
> [E 13:39:06.030756]     [bt] ./noncontig [0x507e2f]
> [E 13:39:06.030806]     [bt] ./noncontig [0x4a5030]
> [E 13:39:06.030854]     [bt] ./noncontig [0x4ae202]
> [E 13:39:06.030903]     [bt] ./noncontig [0x4ae2d5]
> [E 13:39:06.030952]     [bt] ./noncontig [0x479ab0]
> [E 13:39:06.031001]     [bt] ./noncontig [0x41df43]
> [E 13:39:06.031072] PVFS_isys_io call: Invalid argument
> [0] Error -524286 in MPI_File_write
> Undefined dynamic error code
> [E 13:39:06.067249] Warning: non PVFS2 error code (22):
> [E 13:39:06.067468] Send immediately failed: Invalid argument
> [E 13:39:06.067525] Send error: cancelling recv.
> [E 13:39:06.067599] Warning: non PVFS2 error code (22):
> [E 13:39:06.067651] msgpair failed, will retry: Invalid argument
> [E 13:39:06.067706] *** msgpairarray_completion_fn: msgpair to  
> server mx://bb15:
> 0:3 failed: Invalid argument
> [E 13:39:06.067755] *** Non-BMI failure.
> [E 13:39:06.074742] Warning: non PVFS2 error code (22):
> [E 13:39:06.074795] Send immediately failed: Invalid argument
> [E 13:39:06.074843] Send error: cancelling recv.
> [E 13:39:06.074900] Warning: non PVFS2 error code (22):
> [E 13:39:06.074948] msgpair failed, will retry: Invalid argument
> [E 13:39:06.074998] *** msgpairarray_completion_fn: msgpair to  
> server mx://bb15:
> 0:3 failed: Invalid argument
> [E 13:39:06.075046] *** Non-BMI failure.
> [E 13:39:06.075396] Warning: non PVFS2 error code (22):
> [E 13:39:06.075447] Send immediately failed: Invalid argument
> [E 13:39:06.075493] Send error: cancelling recv.
> [E 13:39:06.075551] Warning: non PVFS2 error code (22):
> [E 13:39:06.075599] msgpair failed, will retry: Invalid argument
> [E 13:39:06.075649] *** msgpairarray_completion_fn: msgpair to  
> server mx://bb15:
>
>



More information about the Pvfs2-developers mailing list