[Pvfs2-users] Heavy read workload and "Permission denied" errors
Phil Carns
carns at mcs.anl.gov
Tue Apr 29 11:09:29 EDT 2008
Hi Tiankai,
I believe that Sam may have fixed this problem in the 2.7.1 release,
actually. Here is the mailing list post regarding a similar issue:
http://www.beowulf-underground.org/pipermail/pvfs2-developers/2008-March/003975.html
and here is the associated cvs change set:
http://www.pvfs.org/fisheye/changelog/PVFS/?cs=MAIN:slang:20080325192122
Could you possibly try 2.7.1 and see if that fixes your problem?
-Phil
Tu, Tiankai wrote:
> Hi Sam,
>
> I didn't have a chance to test your suggested configuration (-n 0 -a 0)
> on the 6-node cluster. But recently, I installed and experimented with
> PVFS on a larger cluster with 64 nodes using your proposed runtime
> flags. It worked most of time. But the error of "Permission denied"
> still showed up occasionally. The details of the setup are listed below.
>
> PVFS2 configuration:
>
> - A 64-node Linux cluster, each node has 8 cores
> - Kernel version 2.6.22.15-8smp
> - Pvfs-2.7.0 installed
> - All 64 nodes used as both metadata servers and IO servers
> - PVFS kernel module install on all 64 nodes
> - Regular open/read/close calls from within applications
> - Default file striping on all servers
>
> Application characteristics:
> - Parallel Python programs, using 4 cores out of the 8 core on each node
> - A large number of parallel read threads
> - Mostly independent read traces; occasionally shared accesses to the
> same file but no more than two threads
> - Large, equally-sized files (> 64 MB)
> - Each thread opens a file, reads in the content of the entire file
> (most of the time), extracts data of interest, closes the file and
> moves to the next file
> - The sequence of files to be accessed by each thread pre-determined
> (i.e., no runtime arbitration)
>
> An error example:
>
> Cannot open file:
> /scratch/mnt/pvfs2/dataset/merged_frameset_64MB/conduction/trj.dtr_20071
> 207222232/frame000000025 [Errno 13]. Permission denied.
>
> Extra information:
>
> A number of [Errno 11] "Resource temporarily unavailable" showed up
> earlier. I changed the default PVFS configuration as follows and no
> longer saw errno 11.
>
> <Defaults>
> UnexpectedRequests 2048
> ServerJobBMITimeoutSecs 30
> ServerJobFlowTimeoutSecs 30
> ClientJobBMITimeoutSecs 5
> ClientJobFlowTimeoutSecs 5
> ClientRetryLimit 8
> ClientRetryDelayMilliSecs 0
>
> TroveMaxConcurrentIO 64
> </Defaults>
>
> Tiankai
>
>
>
> -----Original Message-----
> From: Sam Lang [mailto:slang at mcs.anl.gov]
> Sent: Tuesday, March 18, 2008 11:23 AM
> To: Tu, Tiankai
> Cc: pvfs2-users at beowulf-underground.org
> Subject: Re: [Pvfs2-users] Heavy read workload and "Permission denied"
> errors
>
>
> Hi Tiankai,
>
> I've been debugging something similar I think, but I'm not able to
> reproduce the EACCES (Permission denied) error with only a few nodes.
> It would be helpful to eliminate a few things to isolate the problem,
> and see we're both looking at the same bug.
>
> Can you disable the name and attribute cache in the client daemon? To
> do that, you should be able to start the pvfs2-client with -n 0 -a 0.
> With those options, does the problem persist?
>
> Are your nodes x86_64?
>
> What happens if you just use one node as a metadata server instead of
> all 6?
>
> Thanks,
> -sam
>
> On Mar 17, 2008, at 11:20 AM, Tu, Tiankai wrote:
>
>> I have been testing whether PVFS2 can be used to support large-scale
>> read-intensive parallel workload, in particular, post-simulation data
>> analysis. Although the preliminary results (on a small cluster) are
>> encouraging when everything worked, there have been a few occasions
>> where mysterious "Permission Denied" errors occurred and the
>> applications halted.
>>
>> Below are the system hardware/software setup:
>>
>> - 6 compute nodes each with 8 cores, 16 GB memory, 170 GB free disk
>> space managed by xfs.
>> - Nodes are interconnected by a 1 GigE cable to a 10 GigE switch
>> - Linux kernel: 2.6.22.15-7smp
>>
>> PVFS setup
>>
>> - pvfs-2.7.0 installed
>> - All the 6 nodes also used as both metadata servers and IO servers
>> - The same 6 nodes used to run application codes (as pvfs clients)
>> - pvfs kernel module installed on all the nodes
>> - pvfs mounted with the local hostname specified as the metadata
>> server
>> on each node
>> - regular unix open/read/close calls from within the applications
>> - Default file striping on all the servers
>>
>> Application characteristics:
>>
>> - Parallel Python programs
>> - A large number of parallel read threads
>> - Mostly independent read traces; occasionally shared accesses to the
>> same file but by no more than 2 threads
>> - Large, equally-sized files (> 64 MB)
>> - Each thread opens a file, reads in the content of the entire file
>> (most of the time), extracts data of interest, closes the file and
>> moves
>> to the next file
>> - The sequence of files to be accessed by each thread pre-determined
>> (i.e., no runtime arbitration)
>> - Experiments run on configurations with different number of nodes and
>> different number of cores per node; total number of (read) threads
>> determined by (number of nodes X cores per nodes)
>>
>> Error:
>> - An example (6 nodes, 4 threads per node) : Cannot open file
>> /scratch/mnt/pvfs2/merged_frameset_64MB/p2auto/00000001/trj/
>> frame0000008
>> 44 [Errno 13] Permission denied:
>> '/scratch/mnt/pvfs2/merged_frameset_64MB/p2auto/00000001/trj/
>> frame000000
>> 844'
>> - Similar errors encountered in other node/thread configurations
>> - The files being reported as inaccessible were all verified to be
>> accessible from all the 6 compute/storage nodes
>>
>>
>> Extra information:
>> - On the first trial with PVFS, a different error "[Errno 11] Resource
>> temporarily unavailable" occurred multiple times along with "[Errno
>> 13]
>> Permission denied."
>> - PVFS configuration was changed to increase the number of retry
>> from 5
>> to 10 and delay from 2 to 2.5 sec
>> - [Errno 11] did not show up again; but [Errno 13] showed up more
>> often
>>
>> Thanks for the help.
>> Tiankai
>>
>>
>>
>> _______________________________________________
>> Pvfs2-users mailing list
>> Pvfs2-users at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
More information about the Pvfs2-users
mailing list