[Pvfs2-users] pvfs2 stability

Andrea Carotti and.carotti at farmchim.uniba.it
Mon May 22 12:17:05 EDT 2006


Dear Mr Murali,
I'm using the 1.4.0 version.
everything seems to work: pvfs2-fs-dump, pvfs2-ping, pvfs2-ls...and also the 
typical commands like mkdir and cp works well on all the nodes and on the 
server.
One strange thing is also happening at the login:
[user1 at om1:~]su
Password:
open: No such file or directory
apparent state: unit 27 named -e
lately writing direct unformatted external IO
Segnale di annullamento
open: No such file or directory
apparent state: unit 27 named -f
lately writing direct unformatted external IO
Segnale di annullamento
open: No such file or directory
apparent state: unit 27 named -f
lately writing direct unformatted external IO
Segnale di annullamento
open: No such file or directory
apparent state: unit 27 named -f
lately writing direct unformatted external IO
Segnale di annullamento
open: No such file or directory
apparent state: unit 27 named -f
lately writing direct unformatted external IO
Segnale di annullamento
open: No such file or directory
apparent state: unit 27 named -e
lately writing direct unformatted external IO
Segnale di annullamento

thanks a lot
Andrea
----- Original Message ----- 
From: "Murali Vilayannur" <vilayann at mcs.anl.gov>
To: "Andrea Carotti" <and.carotti at farmchim.uniba.it>
Cc: <pvfs2-users at beowulf-underground.org>
Sent: Monday, May 22, 2006 6:07 PM
Subject: Re: [Pvfs2-users] pvfs2 stability


> Hi Andrea,
> Hmm..Nothing looks out of the ordinary from the config files..
> Since you mentioned that the VFS interface does not work, could you
> confirm if the pvfs system interface based tools work or not?
> (i.e. pvfs2-fs-dump, pvfs2-ping, pvfs2-ls etc under src/apps/admin)
> It would be good to narrow down which component(s) is/are causing all
> these failures...Any other information from the logs (or running all the
> components with extra verbose logging) could also help narrow down what
> the issue might be.. BTW, are you using pvfs2 1.4.0 or CVS head?
> Thanks,
> Murali
>
>> cat /home/Application/pvfs/conf/pvfs2-fs.conf
>> <Defaults>
>>         UnexpectedRequests 50
>>         LogFile /tmp/pvfs2-server.log
>>         EventLogging none
>>         LogStamp usec
>>         BMIModules bmi_tcp
>>         FlowModules flowproto_multiqueue
>>         PerfUpdateInterval 1000
>>         ServerJobBMITimeoutSecs 30
>>         ServerJobFlowTimeoutSecs 30
>>         ClientJobBMITimeoutSecs 300
>>         ClientJobFlowTimeoutSecs 300
>>         ClientRetryLimit 5
>>         ClientRetryDelayMilliSecs 2000
>> </Defaults>
>>
>> <Aliases>
>>         Alias dom1 tcp://dom1:3334
>>         Alias dom2 tcp://dom2:3334
>>         Alias dom3 tcp://dom3:3334
>>         Alias dom4 tcp://dom4:3334
>>         Alias om1 tcp://om1:3334
>>         Alias om2 tcp://om2:3334
>>         Alias om3 tcp://om3:3334
>>         Alias om4 tcp://om4:3334
>>         Alias om5 tcp://om5:3334
>> </Aliases>
>>
>> <Filesystem>
>>         Name pvfs2-fs
>>         ID 1869706856
>>         RootHandle 1048576
>>         <MetaHandleRanges>
>>                 Range om1 4-429496732
>>         </MetaHandleRanges>
>>         <DataHandleRanges>
>>                 Range dom1 429496733-858993461
>>                 Range dom2 858993462-1288490190
>>                 Range dom3 1288490191-1717986919
>>                 Range dom4 1717986920-2147483648
>>                 Range om1 2147483649-2576980377
>>                 Range om2 2576980378-3006477106
>>                 Range om3 3006477107-3435973835
>>                 Range om4 3435973836-3865470564
>>                 Range om5 3865470565-4294967293
>>         </DataHandleRanges>
>>         <StorageHints>
>>                 TroveSyncMeta yes
>>                 TroveSyncData no
>>                 AttrCacheKeywords datafile_handles,metafile_dist
>>                 AttrCacheKeywords dir_ent, symlink_target
>>                 AttrCacheSize 4093
>>                 AttrCacheMaxNumElems 32768
>>         </StorageHints>
>> </Filesystem>
>>
>> Om1 is the server/client hostname
>> cat /home/Application/pvfs/conf/pvfs2-server.conf-om1
>> StorageSpace /pvfs2-storage-space
>> HostID "tcp://om1:3334"
>>
>> Om2 is a client hostname
>> cat /home/Application/pvfs/conf/pvfs2-server.conf-om2
>> StorageSpace /pvfs2-storage-space
>> HostID "tcp://om2:3334"
>>
>>
>> Let me know if you need more informations.
>> Thanks
>> Andrea
>>
>> ----- Original Message -----
>> From: "Murali Vilayannur" <vilayann at mcs.anl.gov>
>> To: "Andrea Carotti" <and.carotti at farmchim.uniba.it>
>> Cc: <pvfs2-users at beowulf-underground.org>
>> Sent: Monday, May 22, 2006 5:45 PM
>> Subject: Re: [Pvfs2-users] pvfs2 stability
>>
>>
>> > Hi Andrea,
>> > It does look a bit strange to see these messages and yet have the FS
>> > working..
>> > Could you post your fs.conf and server.conf files?
>> > thanks,
>> > Murali
>> >
>> > On Mon, 22 May 2006, Andrea Carotti wrote:
>> >
>> >> Hi all,
>> >> I'm new to this list and to the pvfs2 program. I'm using it on our 
>> >> home
>> >> made
>> >> cluster (9 nodes) running an openMosix kernel 2.4.22-3 and Fedora 
>> >> Core2.
>> >> I've installed it with one node running as meta server ,  PVFS2 server
>> >> and
>> >> data servers and all the others like data servers.
>> >> I've also compiled and installed the module.
>> >> This is my actual configuration:
>> >> 1)on all nodes I've an entry in /etc/fstab like this:
>> >> tcp://om1:3334/pvfs2-fs /mnt/pvfs2 pvfs2 default,noauto 0 0
>> >> 2)i've added at the rc.local these lines:
>> >> insmod /lib/modules/2.4.22-oM3src/kernel/fs/pvfs2/pvfs2.o
>> >> /home/Application/pvfs/sbin/pvfs2-client -p
>> >> /home/Application/pvfs/sbin/pvfs2-client-core
>> >> mount -t pvfs2 tcp://om1:3334/pvfs2-fs /mnt/pvfs2
>> >> 3) I've enbled the default service for the startup on all the nodes
>> >> /etc/init.d/pvfs2-server
>> >>
>> >> I'm encountering some problems with its usage:
>> >> if I start the server (/etc/init.d/pvfs2-server start) everything 
>> >> seems
>> >> ok
>> >> but on the server the /tmp/pvfs2-client.log appears with this errors:
>> >>
>> >> [E 16:57:50.651742] msgpair failed, will retry:: Broken pipe
>> >> [E 16:57:52.691656] msgpair failed, will retry:: Connection refused
>> >> [E 16:57:54.731666] msgpair failed, will retry:: Connection refused
>> >> [E 16:57:56.771657] msgpair failed, will retry:: Connection refused
>> >> [E 16:57:58.811658] msgpair failed, will retry:: Connection refused
>> >> [E 16:58:00.851658] msgpair failed, will retry:: Connection refused
>> >> [E 16:58:00.851731] *** msgpairarray_completion_fn: msgpair to server
>> >> tcp://om1:3334 failed: Connection refused
>> >> [E 16:58:00.851750] *** Out of retries.
>> >> [E 16:58:00.851769] getattr_object_getattr_failure : Connection 
>> >> refused
>> >>
>> >> However it seems to work: i can write on the /mnt/pvfs2 , make dirs, 
>> >> and
>> >> so
>> >> on with the normal commands cp,mkdir and so on .
>> >>
>> >> But during the day something go wrong infact the next day I never can 
>> >> see
>> >> the /mnt/pvfs2 without restarting the server and looking on the
>> >> /var/log/messages
>> >> i see:
>> >> May 18 23:21:20 om1 kernel: pvfs2: pvfs2_statfs -- wait timed out and
>> >> retries exhausted. aborting attempt.
>> >> May 18 23:27:20 om1 kernel: pvfs2: pvfs2_statfs -- wait timed out and
>> >> retries exhausted. aborting attempt.
>> >> May 19 01:06:07 om1 kernel: pvfs2: pvfs2_inode_getattr -- wait timed 
>> >> out
>> >> and
>> >> retries exhausted. aborting attempt.
>> >> May 19 04:08:26 om1 kernel: pvfs2: pvfs2_statfs -- wait timed out and
>> >> retries exhausted. aborting attempt.
>> >> May 19 04:15:40 om1 kernel: pvfs2: pvfs2_inode_getattr -- wait timed 
>> >> out
>> >> and
>> >> retries exhausted. aborting attempt.
>> >> May 19 23:20:48 om1 kernel: pvfs2: pvfs2_statfs -- wait timed out and
>> >> retries exhausted. aborting attempt.
>> >> May 19 23:26:48 om1 kernel: pvfs2: pvfs2_statfs -- wait timed out and
>> >> retries exhausted. aborting attempt.
>> >> May 20 01:06:04 om1 kernel: pvfs2: pvfs2_inode_getattr -- wait timed 
>> >> out
>> >> and
>> >> retries exhausted. aborting attempt.
>> >> May 20 04:08:25 om1 kernel: pvfs2: pvfs2_statfs -- wait timed out and
>> >> retries exhausted. aborting attempt.
>> >> May 20 04:15:34 om1 kernel: pvfs2: pvfs2_inode_getattr -- wait timed 
>> >> out
>> >> and
>> >> retries exhausted. aborting attempt.
>> >> May 20 23:21:09 om1 kernel: pvfs2: pvfs2_statfs -- wait timed out and
>> >> retries exhausted. aborting attempt.
>> >> May 20 23:27:09 om1 kernel: pvfs2: pvfs2_statfs -- wait timed out and
>> >> retries exhausted. aborting attempt.
>> >> May 21 01:06:05 om1 kernel: pvfs2: pvfs2_inode_getattr -- wait timed 
>> >> out
>> >> and
>> >> retries exhausted. aborting attempt.
>> >> May 21 04:08:31 om1 kernel: pvfs2: pvfs2_statfs -- wait timed out and
>> >> retries exhausted. aborting attempt.
>> >> May 21 04:15:41 om1 kernel: pvfs2: pvfs2_inode_getattr -- wait timed 
>> >> out
>> >> and
>> >> retries exhausted. aborting attempt.
>> >> May 21 23:24:05 om1 kernel: pvfs2: pvfs2_statfs -- wait timed out and
>> >> retries exhausted. aborting attempt.
>> >> May 22 01:06:03 om1 kernel: pvfs2: pvfs2_inode_getattr -- wait timed 
>> >> out
>> >> and
>> >> retries exhausted. aborting attempt.
>> >> May 22 04:08:33 om1 kernel: pvfs2: pvfs2_statfs -- wait timed out and
>> >> retries exhausted. aborting attempt.
>> >> May 22 04:15:41 om1 kernel: pvfs2: pvfs2_inode_getattr -- wait timed 
>> >> out
>> >> and
>> >> retries exhausted. aborting attempt.
>> >>
>> >> Same errors at the same time.
>> >> Sorry for the long message...Hope for someone help
>> >> Thanks
>> >> Andrea
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Pvfs2-users mailing list
>> >> Pvfs2-users at beowulf-underground.org
>> >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>> >>
>> >>
>>
>> _______________________________________________
>> Pvfs2-users mailing list
>> Pvfs2-users at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>> 



More information about the Pvfs2-users mailing list