[Pvfs2-users] PVFS2 on BlueGene

Sam Lang slang at mcs.anl.gov
Mon Apr 23 15:45:15 EDT 2007


On Apr 23, 2007, at 12:37 PM, Matthew Woitaszek wrote:

>
> Sam and Nathan,
>
> Thanks for your quick replies, and thank you for your suggestions.  
> I tried
> them both, but it turned out that it was a BlueGene/L mount  
> problem, and not
> a PVFS2 problem at all.
>
> On Friday, we turned on additional debugging messages, and the  
> following
> messages appeared in the PVFS2 logs on the metadata server:
>
> [A 04/20 17:36] mattheww.users at 172.30.0.59 H=1048576 S=0x2a96943530:
>    lookup_path: path: mattheww, handle: 1047133
> [A 04/20 17:36] mattheww.users at 172.30.0.59 H=1048576 S=0x2a96943530:
>    lookup_path: finish (Success)
> [A 04/20 17:36] mattheww.users at 172.30.0.55 H=1047133 S=0x2a967e0470:
>    lookup_path: path: _file_258_co, lookup failed
> [A 04/20 17:36] mattheww.users at 172.30.0.55 H=1047133 S=0x2a967e0470:
>    lookup_path: finish (No such file or directory)
>
> This had us convinced that it was a problem related to the metadata  
> server,
> and that all of the clients were sending requests properly. To  
> investigate
> further, we ran a very simple program that just barriers, calls  
> fopen once
> for each process with a filename based on rank, and sums the number  
> of file
> descriptors that were returned. Sure enough, at cases > 256 tasks,  
> some of
> the fopen() calls didn't return a file descriptor.
>
> With that, it didn't seem to be a MPI-IO problem. Looking at client  
> logs, we
> found that when booting over 8 32-node partitions, some of the  
> partitions
> weren't properly mounting pvfs2. A few changes to remount if  
> required during
> the boot process fixed the problem. Since then, everything's worked  
> fine. I
> suppose the moral is: "Make sure that your clients are mounting the  
> file
> system!" Those lookup_failed messages were quite perplexing and  
> definitely
> led me to look in the wrong place first.
>

Matthew,

Thanks for the report.  Nice to hear that its not a bug somewhere in  
the PVFS/ROMIO path.  :-)

It might be helpful to disable read and write permissions to the pvfs  
mountpoint (/pvfs) on the IO nodes when the pvfs volume isn't mounted  
-- in fact you can probably just do chmod 000 /pvfs.  This will  
prevent your apps from writing/reading to the /pvfs directory if its  
not actually mounted to a pvfs volume.  Once you mount /pvfs, it gets  
777 permissions with the sticky bit.  So if /pvfs isn't actually  
mounted, your apps will get EPERM errors pretty early in the IO process.

Also, you can mount the pvfs volume from any of the pvfs servers  
(including the IO servers), which can help to distribute the mount  
workload when the IO nodes startup and all try to mount at once.  At  
ANL, the IO nodes pick a server randomly when mounting.

> My apologies for bothering everyone with this, and again, thanks  
> for your
> quick offers with assistance. I really appreciate it!

It helps to hear reports like this so we know what to look for when  
the same thing happens to us in the future.  :-)

-sam

>
> Matthew
>
> -----Original Message-----
> From: Sam Lang [mailto:slang at mcs.anl.gov]
> Sent: Friday, April 20, 2007 4:53 PM
> To: Matthew Woitaszek
> Cc: pvfs2-users at beowulf-underground.org
> Subject: Re: [Pvfs2-users] PVFS2 on BlueGene
>
>
>
> Hi Matthew,
>
> Does mpi-io-test consistently fail with 257 nodes (9 IO nodes), or do
> you get any successful runs there?    Are there any messages in the
> pvfs server logs (/tmp/pvfs2-server.log)?
>
> Thanks,
>
> -sam
>
>
> On Apr 20, 2007, at 4:25 PM, Matthew Woitaszek wrote:
>
>>
>> Good afternoon,
>>
>> Michael Oberg and I are attempting to get PVFS2 working on NCAR's 1-
>> rack
>> BlueGene/L system using ZeptoOS. We ran into a snag at over 8 BG/L
>> I/O nodes
>> (>256 compute nodes).
>>
>> We've been using the mpi-io-test program shipped with PVFS2 to test
>> the
>> system. For cases up to and including 8 I/O nodes (256 coprocessor
>> or 512
>> virtual node mode tasks), everything works fine. Larger jobs fail
>> with file
>> not found error messages, such as:
>>
>>    MPI_File_open: File does not exist, error stack:
>>    ADIOI_BGL_OPEN(54): File /pvfs2/mattheww/_file_0512_co does not
>> exist
>>
>> The file is created on the PVFS2 filesystem and has a zero-byte
>> size. We've
>> run the tests with 512 tasks on 256 nodes, and it successfully
>> created a
>> 8589934592-byte file. Going to 257 nodes fails.
>>
>> Has anyone seen this behavior before? Are there any PVFS2 server or
>> client
>> configuration options that you would recommend for a BG/L
>> installation like
>> this?
>>
>> Thanks for your time,
>>
>> Matthew
>>
>>
>>
>> _______________________________________________
>> Pvfs2-users mailing list
>> Pvfs2-users at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>
>



More information about the Pvfs2-users mailing list