[Pvfs2-users] Problem with restarting pvfs on a cluster

Sam Lang slang at mcs.anl.gov
Tue Oct 9 11:07:32 EDT 2007


On Oct 9, 2007, at 9:40 AM, Giammanco Raimondo wrote:

> Hello Mr. Ross,
>
> thanks for your prompt reply.
>
> I believe the config file you mention is (for my case)  /etc/pvfs2- 
> server.conf-master-pvfs.
> its contents are:
> ############################
> StorageSpace /pvfs2-storage-space
> HostID "tcp://master-pvfs:3334"
> LogFile /tmp/pvfs2-server.log
> ############################
>
> The config file for a node, /etc/pvfs2-server.conf-node1-pvfs for  
> example, is the following:
> ############################
> StorageSpace /pvfs2-storage-space
> HostID "tcp://node1-pvfs:3334"
> LogFile /tmp/pvfs2-server.log
> ############################
>
> Now, this /pvfs2-storage-space is unfortunately directly on the /,  
> so the wrong
> mount timing theory is unfortunately to discard.

In the directory listing you gave us for node1 /pvfs2-storage-space,  
there's a lost+found directory.  That only appears if you've mounted  
another volume into that directory.  My guess is that for the master  
node, you've managed to somehow create part of the storage space  
before mounting something to /pvfs2-storage-space, and the rest was  
created after.  You're only seeing what was created before the  
mount.  That's just a guess though.  Can you send us the output of  
'mount' on node1 and master?

-sam

>
> On the nodes instead /pvfs2-storage-space it is on a mounted  
> filesystem, /dev/md1,
> but there all goes apparently right, so it seems to me that really  
> there is a problem
> with the master node and metadata server.
>
> The suggestion given by the log of pvfs2-server binary of using the  
> -f option looks
> very dangerous to me, or in case of the metadata server it is ok,  
> in the sense that
> it will reconstruct the data from the IO nodes? I cannot understand  
> why
> the different storages have the same directory in common "744468fe",
> but the master has nothing else beside this empty directory.
>
> Even if the pvfs2-server process had been killed in a not clean way  
> on the master and metadata server,
> it would not have been able (I assume) to delete data on the  
> storage directory...
>
> So this absence of data in  /pvfs2-storage-space for the metadata  
> server is both disconcerting and confusing...
>
> Hope this mail will help us to proceed further.
>
> Best Regards
> Raimondo
>
> Rob Ross wrote:
>> Hi Raimondo,
>>
>> Two things. One, there is a second config file around that  
>> specifies the storage directory etc. You should be able to find it  
>> in /etc/ also. Please send that to us.
>>
>> An idea is that perhaps /pvfs2-storage-space is a mounted file  
>> system, and that somehow it is getting mounted *after* the server  
>> is started? Just a blind guess. If you try to start the service  
>> after the system has finished booting, does it do the same thing?
>>
>> Thanks,
>>
>> Rob
>>
>> Raimondo Giammanco wrote:
>>> Hello, there.
>>>
>>>  I am coming here seeking words of wisdom. I have looked the  
>>> interweb and
>>> this list but I cannot seem to find useful informations, so I  
>>> post here.
>>> I apologize if the answer to the question has already been  
>>> provided and I
>>> could not find it.
>>>
>>> I have a problem with a pvfs2 installation that has been set-up  
>>> by a third
>>> person. The cluster has been shutdown cleanly for a scheduled  
>>> maintenance
>>> on the power lines, and I cannot bring pvfs2 up again.
>>>
>>> Here is the description.
>>>
>>> There is a cluster using a fronted and 9 nodes.
>>>
>>> As far as I understand, the fronted is a metadata server, and the  
>>> nodes
>>> are IO servers, as for the /etc/pvfs2-fs.conf file I present here  
>>> below:
>>>
>>> ####################
>>> <Defaults>
>>>         UnexpectedRequests 50
>>>         EventLogging none
>>>         LogStamp datetime
>>>         BMIModules bmi_tcp
>>>         FlowModules flowproto_multiqueue
>>>         PerfUpdateInterval 1000
>>>         ServerJobBMITimeoutSecs 30
>>>         ServerJobFlowTimeoutSecs 30
>>>         ClientJobBMITimeoutSecs 300
>>>         ClientJobFlowTimeoutSecs 300
>>>         ClientRetryLimit 5
>>>         ClientRetryDelayMilliSecs 2000
>>> </Defaults>
>>>
>>> <Aliases>
>>>         Alias master-pvfs tcp://master-pvfs:3334
>>>         Alias node1-pvfs tcp://node1-pvfs:3334
>>>         Alias node2-pvfs tcp://node2-pvfs:3334
>>>         Alias node3-pvfs tcp://node3-pvfs:3334
>>>         Alias node4-pvfs tcp://node4-pvfs:3334
>>>         Alias node5-pvfs tcp://node5-pvfs:3334
>>>         Alias node6-pvfs tcp://node6-pvfs:3334
>>>         Alias node7-pvfs tcp://node7-pvfs:3334
>>>         Alias node8-pvfs tcp://node8-pvfs:3334
>>>         Alias node9-pvfs tcp://node9-pvfs:3334
>>> </Aliases>
>>>
>>> <Filesystem>
>>>         Name pvfs2-fs
>>>         ID 1950640382
>>>         RootHandle 1048576
>>>         <MetaHandleRanges>
>>>                 Range master-pvfs 4-429496732
>>>         </MetaHandleRanges>
>>>         <DataHandleRanges>
>>>                 Range node1-pvfs 429496733-858993461
>>>                 Range node2-pvfs 858993462-1288490190
>>>                 Range node3-pvfs 1288490191-1717986919
>>>                 Range node4-pvfs 1717986920-2147483648
>>>                 Range node5-pvfs 2147483649-2576980377
>>>                 Range node6-pvfs 2576980378-3006477106
>>>                 Range node7-pvfs 3006477107-3435973835
>>>                 Range node8-pvfs 3435973836-3865470564
>>>                 Range node9-pvfs 3865470565-4294967293
>>>         </DataHandleRanges>
>>>         <StorageHints>
>>>                 TroveSyncMeta yes
>>>                 TroveSyncData no
>>>         </StorageHints>
>>> </Filesystem>
>>> ####################
>>>
>>> The nodes are apparently working correctly, at boot the /etc/ 
>>> init.d/pvfs2
>>> script worked and the log file (/tmp/pvfs2-server.log) gives me  
>>> for a
>>> node:
>>> ####################
>>> [D 10/08 14:39] PVFS2 Server version 2.6.2 starting.
>>> ####################
>>>
>>> on the master instead, it gives
>>> ####################
>>> [D 10/09 11:09] PVFS2 Server version 2.6.2 starting.
>>> [E 10/09 11:09] Error: trove_initialize: No such file or directory
>>> [E 10/09 11:09]
>>> ***********************************************
>>> [E 10/09 11:09] Invalid Storage Space: /pvfs2-storage-space
>>>
>>> [E 10/09 11:09] Storage initialization failed.  The most common  
>>> reason
>>> for this is that the storage space has not yet been
>>> created or is located on a partition that has not yet
>>> been mounted.  If you'd like to create the storage space,
>>> re-run this program with a -f option.
>>> [E 10/09 11:09]
>>> ***********************************************
>>> [E 10/09 11:09] Error: Could not initialize server interfaces;  
>>> aborting.
>>> [E 10/09 11:09] Error: Could not initialize server; aborting.
>>> ####################
>>>
>>> Now, the storage space on the nodes is full:
>>> ####################
>>> [root at node1 ~]# ls /pvfs2-storage-space/
>>> 744468fe  collections.db  lost+found  storage_attributes.db
>>> ####################
>>> on the master (frontend) not:
>>> ####################
>>> [root at master ~]# ls /pvfs2-storage-space/
>>> 744468fe
>>> ####################
>>>
>>> Anyone can point me in the right direction?
>>>
>>> Thanks Again
>>>
>>> Raimondo
>>> _______________________________________________
>>> Pvfs2-users mailing list
>>> Pvfs2-users at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>
>
> <giamma.vcf>
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users



More information about the Pvfs2-users mailing list