[Pvfs2-users] romio problems

Jan Lindheim lindheim at cacr.caltech.edu
Thu Mar 22 11:57:22 EST 2007


 >> We have found that when trying to use pvfs with romio under openmpi,
 >> we are getting errors when the task count is bigger than 128, using
 >> 1MB messages.  Smaller message sizes and larger task counts also cause
 >> the same error to be generated, just not as consistently or quickly.
 >> Errors that we see look like:
 >>
 >> [E 15:05:50.012128] job_time_mgr_expire: job time out: cancelling 
bmi operation, job_id: 34.
 >> [E 15:05:50.012380] msgpair failed, will retry: Operation cancelled 
(possibly due to timeout)

 >Just want to understand your workload a bit:
 >You are doing a collective write with 128 processes each writing 1MB,
right?

The code is not using collective writes.

 >> Writing to an NFS mounted file system instead of PVFS, works fine even
 >> with 256 tasks.
 >> Our version of PVFS is 2.6.2.  Both openmpi 1.1.x and 1.2 produce the
 >> same errors.  Any known limitations with romio and PVFS?
 >> We can supply you with a test code if you are interested in reproducing
 >> the problem.  The code should compile well with mpich as well as
 > openmpi.

 >Go ahead and send the test code, but it really looks like you are
 >pushing the servers hard and hitting a timeout.  How many servers do
 >you have for this many clients?  PVFS should be smarter about such a
 >situation, but could you check something for us?  In your fs.conf,
 >what is the value of ServerJobBMITimeoutSecs ?

 >http://www.pvfs.org/pvfs2-options.html#ServerJobBMITimeoutSecs

 >If you increase that value to, say, 3600, we can ensure the timeouts
 >won't get triggered.

 >I have a few other ideas, but let's try this one first.

 >==rob

 >--
 >Rob Latham
 >Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
 >Argonne National Lab, IL USA                 B29D F333 664A 4280 315B

For this PVFS file system, we are using 8 I/O servers and one meta data
server.  I have adjusted the value of ServerJobBMITimeoutSecs on all
the servers involved.  They had the default value of 30.  I will try
to schedule an interrupt later today, to restart the pvfs2-server 
processes.  I will let you know how the next test goes after this.

Attached is the test code.  The tar-ball contains two subdirectories,
utilities and mpi_io_test.  You need to cd into mpi_io_test/src.  Here
you'll find a README file, which describes the problems we see on our 
cluster, specifics about our sw env., how to build the code and how to
run the code.

Jan Lindheim
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpi-io-test.tar
Type: application/x-tar
Size: 972800 bytes
Desc: not available
Url : http://www.beowulf-underground.org/pipermail/pvfs2-users/attachments/20070322/075d4597/mpi-io-test-0001.tar


More information about the Pvfs2-users mailing list