[Pvfs2-users] PVFS2 on BlueGene

Andrew Cherry acherry at mcs.anl.gov
Fri Apr 20 18:18:53 EDT 2007


Matthew, Sam-

FYI, using jumbo frames is not necessarily that simple.  The IBM file  
servers that came with our BG/L don't support jumbo frames on the  
internal NICs.  Ours are IBM x346 type 8840, and the built-in  
Broadcom NICs couldn't handle jumbo frames.  I imagine other xSeries  
boxes with integrated Broadcom NICs may have similar issues.  We  
ended up buying PCI network cards in order to implement jumbo frames  
in our environment.  Also, you'll need to make sure your network  
switch can handle jumbo frames (ours is a Force10, don't know the  
exact model off the top of my head but it supports jumbo frames).

The other thing you need to be aware of is that switching to jumbo  
frames is an all-or-nothing proposition; if you do it, you'll have to  
do it for *all* of the hardware on the involved network segment.  You  
can't just change a couple of servers.

I'm Cc:ing a couple of folks at Argonne who worked on getting jumbo  
frames working for our environment; they might be able to warn you of  
any other gotchas.  We're using 8000 byte frames, but if I were  
starting from scratch I'd try something closer to 8300 so that an  
entire 8192-byte NFS packet can fit in a single frame (avoiding  
fragmentation if you're using an 8192 byte NFS rsize/wsize).  Note,  
8300 is just a ballparck guess that I haven't been able to confirm.

Be warned -- in our environment, we started to have problems when we  
got close to 9000 byte frames, so don't go too high.

-Andrew Cherry
  BG/L Support
  Argonne National Laboratory

On Apr 20, 2007, at 5:02 PM, Sam Lang wrote:

>
> Hi Matthew,
>
> I think the version of PVFS in the Zepto release is pvfs2-1.5.1.   
> Besides some performance improvements in the latest release  
> (pvfs-2.6.3), there was a specific bugfix made in PVFS for largish  
> mpi-io jobs.  If you could try the latest (at http:// 
> www.pvfs.org/), it would help us to verify that you're not running  
> into the same problem.
>
> Regarding config options for PVFS on BGL, make sure you have jumbo  
> frames enabled, i.e.
>
> ifconfig eth0 mtu 8000 up
>
> Also, you should probably set the tcp buffer sizes explicitly in  
> the pvfs config file, fs.conf:
>
> <Defaults>
> 	...
>         TCPBufferSend 524288
>         TCPBufferReceive 1048576
> </Defaults>
>
> You might also see better performance with an alternative trove  
> method for doing disk io:
>
> <StorageHints>
> 	...
> 	TroveMethod alt-aio
> </StorageHints>
>
>
> Thanks,
>
> -sam
>
> On Apr 20, 2007, at 4:25 PM, Matthew Woitaszek wrote:
>
>>
>> Good afternoon,
>>
>> Michael Oberg and I are attempting to get PVFS2 working on NCAR's  
>> 1-rack
>> BlueGene/L system using ZeptoOS. We ran into a snag at over 8 BG/L  
>> I/O nodes
>> (>256 compute nodes).
>>
>> We've been using the mpi-io-test program shipped with PVFS2 to  
>> test the
>> system. For cases up to and including 8 I/O nodes (256 coprocessor  
>> or 512
>> virtual node mode tasks), everything works fine. Larger jobs fail  
>> with file
>> not found error messages, such as:
>>
>>    MPI_File_open: File does not exist, error stack:
>>    ADIOI_BGL_OPEN(54): File /pvfs2/mattheww/_file_0512_co does not  
>> exist
>>
>> The file is created on the PVFS2 filesystem and has a zero-byte  
>> size. We've
>> run the tests with 512 tasks on 256 nodes, and it successfully  
>> created a
>> 8589934592-byte file. Going to 257 nodes fails.
>>
>> Has anyone seen this behavior before? Are there any PVFS2 server  
>> or client
>> configuration options that you would recommend for a BG/L  
>> installation like
>> this?
>>
>> Thanks for your time,
>>
>> Matthew
>>
>>
>>
>> _______________________________________________
>> Pvfs2-users mailing list
>> Pvfs2-users at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>



More information about the Pvfs2-users mailing list