[Pvfs2-developers] Strange behavior with level2 (MPI-IO.c)
Sam Lang
slang at mcs.anl.gov
Thu Mar 8 12:07:17 EST 2007
Hi Julian,
I have a few ideas for you to try to help narrow down these bugs.
I'm not sure how well the small-io stuff will work with non-contig.
It was never rigorously tested. Can you recompile with -
DPVFS2_SMALL_IO_OFF and run your tests again?
I've attached a patch that fixes the last valgrind error in your list
(in PINT_distribute). Can you try it and let me know if that fixes it?
Thanks,
-sam
-------------- next part --------------
A non-text attachment was scrubbed...
Name: memset-fdata-smallio.patch
Type: application/octet-stream
Size: 628 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-developers/attachments/20070308/84df2ad9/memset-fdata-smallio.obj
-------------- next part --------------
On Mar 8, 2007, at 10:50 AM, Julian Martin Kunkel wrote:
> Hi,
>
> We see a rather strange and wrong behavior with PVFS2 using a file
> view with
> MPI-IO using different levels :)
>
> mpiexec -np 2 ./MPI-IO -i 4 -f pvfs2://pvfs2/test -s 10 level0
> 0000000 0000 0000 0000 0000 0000 0101 0101 0101
> 0000010 0101 0101 0101 0101 0101 0101 0101 0101
> *
> 0000030 0101 0000 0000 0000 0000 0000 0000 0000
> 0000040 0000 0000 0000
> 0000046
>
> mpiexec -np 2 ./MPI-IO -i 4 -f pvfs2://pvfs2/test -s 10 level2
>
> 0000000 0000 0000 0000 0000 0000 0101 0101 0101
> 0000010 0101 0101 0000 0000 0000 0000 0000 0101
> 0000020 0101 0101 0101 0101 0000 0000 0000 0000
> 0000030 0000 0101 0101 0101 0101 0101 0101 0101
> 0000040 0101 0101 0101
> 0000046
>
> With this level in addition the number of bytes which are
> transfered between
> client and servers does not match the amount of data it should be...
>
> With a level3(non-contig, coll) and level1 (coll, contig) it looks
> correct
> like:
> 0000000 0000 0000 0000 0000 0000 0101 0101 0101
> 0000010 0101 0101 0000 0000 0000 0000 0000 0101
> 0000020 0101 0101 0101 0101 0000 0000 0000 0000
> 0000030 0000 0101 0101 0101 0101 0101 0000 0000
> 0000040 0000 0000 0000 0101 0101 0101 0101 0101
> 0000050
>
> Minimum setup where this error ocurred was with 3 data servers.
> However,
> sometimes for examples with 4 dataservers the bug may disappear.
> Using 5
> dataservers and a bigger file (500K) (mpiexec -np 4 ./MPI-IO -i 10 -f
> pvfs2://pvfs2/test -s 50K level2) shows that the content of the
> file is
> different for different runs. The md5sum might be for example:
> c809928d82ca72e00469283f2450c5f0
> 7d215f060b113f81c2210ac6e8e4c6d9
> b4ca34c8a8a7b06a9b6d29e4b78964c3
>
> Software: PVFS2 03/08/07 CVS and the new tiled-types-for-mkuhn.diff
> patch with
> the current mpich2-1.0.5-p3...
>
> I did some runs for the levels with valgrind this showed (among
> other reported
> issues) in level0 and level2 the following:
> ==18294== Invalid read of size 4
> ==18294== at 0x80EF461: ADIOI_PVFS2_WriteStrided
> (ad_pvfs2_write.c:392)
> ==18294== by 0x80AA299: MPIOI_File_write (write.c:156)
> ==18294== by 0x80A9C80: PMPI_File_write (write.c:52)
> ==18294== by 0x8056706: ??? (log_mpi_io.c:871)
> ==18294== by 0x804ACDA: Test_level0 (MPI-IO.c:75)
> ==18294== by 0x804B699: main (MPI-IO.c:309)
> ==18294== Address 0x4771460 is 0 bytes after a block of size 8
> alloc'd
> ==18294== at 0x401B867: malloc (vg_replace_malloc.c:149)
> ==18294== by 0x80B505C: ADIOI_Malloc_fn (malloc.c:50)
> ==18294== by 0x80B4D66: ADIOI_Optimize_flattened (flatten.c:759)
> ==18294== by 0x80B3036: ADIOI_Flatten_datatype (flatten.c:79)
> ==18294== by 0x80BF8C8: ADIO_Set_view (ad_set_view.c:52)
> ==18294== by 0x80AA85A: PMPI_File_set_view (set_view.c:138)
> ==18294== by 0x8055CDE: MPI_File_set_view (log_mpi_io.c:611)
> ==18294== by 0x804AC80: Test_level0 (MPI-IO.c:70)
> ==18294== by 0x804B699: main (MPI-IO.c:309)
>
> Similar for reads in ReadStrided...
> These issues are not reported for the other levels and look rather
> suspicious
> for me...
>
> The following issue is common for all levels:
> ==18315== Conditional jump or move depends on uninitialised value(s)
> ==18315== at 0x8121869: PINT_distribute (pint-request.c:740)
> ==18315== by 0x811FB0B: PINT_process_request (pint-request.c:322)
> ==18315== by 0x8139641: small_io_completion_fn (sys-small-io.sm:
> 257)
> ==18315== by 0x8180DD9: msgpairarray_completion_fn
> (msgpairarray.sm:547)
> ==18315== by 0x812A648: PINT_state_machine_next (state-machine-
> fns.h:158)
> ==18315== by 0x8129D3D: PINT_client_state_machine_test
> (client-state-machine.c:559)
> ==18315== by 0x812A1C3: PINT_client_wait_internal
> (client-state-machine.c:733)
> ==18315== by 0x812A3C5: PVFS_sys_wait (client-state-machine.c:861)
> ==18315== by 0x813300A: PVFS_sys_io (sys-io.sm:351)
> ==18315== by 0x80ECCCD: ADIOI_PVFS2_ReadStrided (ad_pvfs2_read.c:
> 500)
> ==18315== by 0x80A9571: MPIOI_File_read (read.c:151)
> ==18315== by 0x80A8F58: PMPI_File_read (read.c:52)
>
>
> Thanks,
> Julian
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
More information about the Pvfs2-developers
mailing list