[Pvfs2-users] Problem with restarting pvfs on a cluster

Raimondo Giammanco giamma at vki.ac.be
Wed Oct 10 04:51:45 EDT 2007


Hello Mr. Lang,

 As far as I understand, on the master /pvfs2-storage-space is
not a mount point. /etc/fstab has no mention of it,
and the directory it contains (744468fe) has a timestamp
that is relative to the  day we had to shutdown the master, so
I cannot think that there was something mounted there..

So, I am fairly certain /pvfs2-storage-space on the master was
related to the metadata, but it is empty.

If I were to initialize it with the -f option, would after reconstruct 
the data from the
IO nodes, were all seems correct and the pvfs2-server process started 
correctly?

This seems rather risky to me.

Thanks for your help.

Raimondo


Sam Lang wrote:
>
> On Oct 9, 2007, at 12:57 PM, Raimondo Giammanco wrote:
>
>> Hello Mr. Lang,
>>
>>  the master is a different unit type, different from the nodes that are
>> blades in a rack mounted cluster.
>>
>> The mount command provides on the master:
>> ##################
>> /dev/sda1 on / type ext3 (rw)
>> none on /proc type proc (rw)
>> none on /sys type sysfs (rw)
>> none on /dev/pts type devpts (rw,gid=5,mode=620)
>> usbfs on /proc/bus/usb type usbfs (rw)
>> none on /dev/shm type tmpfs (rw)
>> none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
>> sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
>> nfsd on /proc/fs/nfsd type nfsd (rw)
>> ##################
>>
>>
>> while on the node it is
>> ##################
>> /dev/ram0 on / type ext2 (rw)
>> none on /proc type proc (rw)
>> none on /sys type sysfs (rw)
>> none on /dev/pts type devpts (rw,gid=5,mode=620)
>> usbfs on /proc/bus/usb type usbfs (rw)
>> none on /dev/shm type tmpfs (rw)
>> /dev/md0 on /tmp type ext3 (rw)
>> /dev/md1 on /pvfs2-storage-space type ext3 (rw)
>> none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
>> sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
>> 10.0.0.254:/home on /home type nfs (rw,addr=10.0.0.254)
>> 10.0.0.254:/usr on /usr type nfs (rw,addr=10.0.0.254)
>> 10.0.0.254:/opt on /opt type nfs (rw,addr=10.0.0.254)
>> nfsd on /proc/fs/nfsd type nfsd (rw)
>> #####################
>>
>>
>> The difference is, I believe, that the master has a hardware raid,
>
> Is the hardware raid /dev/sda1 mounted to / ?  If not, maybe the 
> hardware raid on the master needs to be mounted to /pvfs2-storage-space?
>
>> while the nodes have 2 small hd in software raid for the system and
>> temporary data, and 2 big ones, still in software raid, for pvfs.
>
> Ok that explains the lost+found.  FYI, while the /pvfs2-storage-space 
> may exist as a directory in /, it can also be a mountpoint for 
> something else, so its contents may not be visible (at least the 
> contents you would expect) if you haven't mounted everything properly.
>
> -sam
>
>>
>> Regards,
>> Raimondo
>>
>>
>>>
>>> On Oct 9, 2007, at 9:40 AM, Giammanco Raimondo wrote:
>>>
>>>> Hello Mr. Ross,
>>>>
>>>> thanks for your prompt reply.
>>>>
>>>> I believe the config file you mention is (for my case)  /etc/pvfs2-
>>>> server.conf-master-pvfs.
>>>> its contents are:
>>>> ############################
>>>> StorageSpace /pvfs2-storage-space
>>>> HostID "tcp://master-pvfs:3334"
>>>> LogFile /tmp/pvfs2-server.log
>>>> ############################
>>>>
>>>> The config file for a node, /etc/pvfs2-server.conf-node1-pvfs for
>>>> example, is the following:
>>>> ############################
>>>> StorageSpace /pvfs2-storage-space
>>>> HostID "tcp://node1-pvfs:3334"
>>>> LogFile /tmp/pvfs2-server.log
>>>> ############################
>>>>
>>>> Now, this /pvfs2-storage-space is unfortunately directly on the /,
>>>> so the wrong
>>>> mount timing theory is unfortunately to discard.
>>>
>>> In the directory listing you gave us for node1 /pvfs2-storage-space,
>>> there's a lost+found directory.  That only appears if you've mounted
>>> another volume into that directory.  My guess is that for the master
>>> node, you've managed to somehow create part of the storage space
>>> before mounting something to /pvfs2-storage-space, and the rest was
>>> created after.  You're only seeing what was created before the
>>> mount.  That's just a guess though.  Can you send us the output of
>>> 'mount' on node1 and master?
>>>
>>> -sam
>>>
>>>>
>>>> On the nodes instead /pvfs2-storage-space it is on a mounted
>>>> filesystem, /dev/md1,
>>>> but there all goes apparently right, so it seems to me that really
>>>> there is a problem
>>>> with the master node and metadata server.
>>>>
>>>> The suggestion given by the log of pvfs2-server binary of using the
>>>> -f option looks
>>>> very dangerous to me, or in case of the metadata server it is ok,
>>>> in the sense that
>>>> it will reconstruct the data from the IO nodes? I cannot understand
>>>> why
>>>> the different storages have the same directory in common "744468fe",
>>>> but the master has nothing else beside this empty directory.
>>>>
>>>> Even if the pvfs2-server process had been killed in a not clean way
>>>> on the master and metadata server,
>>>> it would not have been able (I assume) to delete data on the
>>>> storage directory...
>>>>
>>>> So this absence of data in  /pvfs2-storage-space for the metadata
>>>> server is both disconcerting and confusing...
>>>>
>>>> Hope this mail will help us to proceed further.
>>>>
>>>> Best Regards
>>>> Raimondo
>>>>
>>>> Rob Ross wrote:
>>>>> Hi Raimondo,
>>>>>
>>>>> Two things. One, there is a second config file around that
>>>>> specifies the storage directory etc. You should be able to find it
>>>>> in /etc/ also. Please send that to us.
>>>>>
>>>>> An idea is that perhaps /pvfs2-storage-space is a mounted file
>>>>> system, and that somehow it is getting mounted *after* the server
>>>>> is started? Just a blind guess. If you try to start the service
>>>>> after the system has finished booting, does it do the same thing?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Rob
>>>>>
>>>>> Raimondo Giammanco wrote:
>>>>>> Hello, there.
>>>>>>
>>>>>>  I am coming here seeking words of wisdom. I have looked the
>>>>>> interweb and
>>>>>> this list but I cannot seem to find useful informations, so I
>>>>>> post here.
>>>>>> I apologize if the answer to the question has already been
>>>>>> provided and I
>>>>>> could not find it.
>>>>>>
>>>>>> I have a problem with a pvfs2 installation that has been set-up
>>>>>> by a third
>>>>>> person. The cluster has been shutdown cleanly for a scheduled
>>>>>> maintenance
>>>>>> on the power lines, and I cannot bring pvfs2 up again.
>>>>>>
>>>>>> Here is the description.
>>>>>>
>>>>>> There is a cluster using a fronted and 9 nodes.
>>>>>>
>>>>>> As far as I understand, the fronted is a metadata server, and the
>>>>>> nodes
>>>>>> are IO servers, as for the /etc/pvfs2-fs.conf file I present here
>>>>>> below:
>>>>>>
>>>>>> ####################
>>>>>> <Defaults>
>>>>>>         UnexpectedRequests 50
>>>>>>         EventLogging none
>>>>>>         LogStamp datetime
>>>>>>         BMIModules bmi_tcp
>>>>>>         FlowModules flowproto_multiqueue
>>>>>>         PerfUpdateInterval 1000
>>>>>>         ServerJobBMITimeoutSecs 30
>>>>>>         ServerJobFlowTimeoutSecs 30
>>>>>>         ClientJobBMITimeoutSecs 300
>>>>>>         ClientJobFlowTimeoutSecs 300
>>>>>>         ClientRetryLimit 5
>>>>>>         ClientRetryDelayMilliSecs 2000
>>>>>> </Defaults>
>>>>>>
>>>>>> <Aliases>
>>>>>>         Alias master-pvfs tcp://master-pvfs:3334
>>>>>>         Alias node1-pvfs tcp://node1-pvfs:3334
>>>>>>         Alias node2-pvfs tcp://node2-pvfs:3334
>>>>>>         Alias node3-pvfs tcp://node3-pvfs:3334
>>>>>>         Alias node4-pvfs tcp://node4-pvfs:3334
>>>>>>         Alias node5-pvfs tcp://node5-pvfs:3334
>>>>>>         Alias node6-pvfs tcp://node6-pvfs:3334
>>>>>>         Alias node7-pvfs tcp://node7-pvfs:3334
>>>>>>         Alias node8-pvfs tcp://node8-pvfs:3334
>>>>>>         Alias node9-pvfs tcp://node9-pvfs:3334
>>>>>> </Aliases>
>>>>>>
>>>>>> <Filesystem>
>>>>>>         Name pvfs2-fs
>>>>>>         ID 1950640382
>>>>>>         RootHandle 1048576
>>>>>>         <MetaHandleRanges>
>>>>>>                 Range master-pvfs 4-429496732
>>>>>>         </MetaHandleRanges>
>>>>>>         <DataHandleRanges>
>>>>>>                 Range node1-pvfs 429496733-858993461
>>>>>>                 Range node2-pvfs 858993462-1288490190
>>>>>>                 Range node3-pvfs 1288490191-1717986919
>>>>>>                 Range node4-pvfs 1717986920-2147483648
>>>>>>                 Range node5-pvfs 2147483649-2576980377
>>>>>>                 Range node6-pvfs 2576980378-3006477106
>>>>>>                 Range node7-pvfs 3006477107-3435973835
>>>>>>                 Range node8-pvfs 3435973836-3865470564
>>>>>>                 Range node9-pvfs 3865470565-4294967293
>>>>>>         </DataHandleRanges>
>>>>>>         <StorageHints>
>>>>>>                 TroveSyncMeta yes
>>>>>>                 TroveSyncData no
>>>>>>         </StorageHints>
>>>>>> </Filesystem>
>>>>>> ####################
>>>>>>
>>>>>> The nodes are apparently working correctly, at boot the /etc/
>>>>>> init.d/pvfs2
>>>>>> script worked and the log file (/tmp/pvfs2-server.log) gives me
>>>>>> for a
>>>>>> node:
>>>>>> ####################
>>>>>> [D 10/08 14:39] PVFS2 Server version 2.6.2 starting.
>>>>>> ####################
>>>>>>
>>>>>> on the master instead, it gives
>>>>>> ####################
>>>>>> [D 10/09 11:09] PVFS2 Server version 2.6.2 starting.
>>>>>> [E 10/09 11:09] Error: trove_initialize: No such file or directory
>>>>>> [E 10/09 11:09]
>>>>>> ***********************************************
>>>>>> [E 10/09 11:09] Invalid Storage Space: /pvfs2-storage-space
>>>>>>
>>>>>> [E 10/09 11:09] Storage initialization failed.  The most common
>>>>>> reason
>>>>>> for this is that the storage space has not yet been
>>>>>> created or is located on a partition that has not yet
>>>>>> been mounted.  If you'd like to create the storage space,
>>>>>> re-run this program with a -f option.
>>>>>> [E 10/09 11:09]
>>>>>> ***********************************************
>>>>>> [E 10/09 11:09] Error: Could not initialize server interfaces;
>>>>>> aborting.
>>>>>> [E 10/09 11:09] Error: Could not initialize server; aborting.
>>>>>> ####################
>>>>>>
>>>>>> Now, the storage space on the nodes is full:
>>>>>> ####################
>>>>>> [root at node1 ~]# ls /pvfs2-storage-space/
>>>>>> 744468fe  collections.db  lost+found  storage_attributes.db
>>>>>> ####################
>>>>>> on the master (frontend) not:
>>>>>> ####################
>>>>>> [root at master ~]# ls /pvfs2-storage-space/
>>>>>> 744468fe
>>>>>> ####################
>>>>>>
>>>>>> Anyone can point me in the right direction?
>>>>>>
>>>>>> Thanks Again
>>>>>>
>>>>>> Raimondo
>>>>>> _______________________________________________
>>>>>> Pvfs2-users mailing list
>>>>>> Pvfs2-users at beowulf-underground.org
>>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>>>>
>>>>
>>>> <giamma.vcf>
>>>> _______________________________________________
>>>> Pvfs2-users mailing list
>>>> Pvfs2-users at beowulf-underground.org
>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>
>>
>>




More information about the Pvfs2-users mailing list