[Pvfs2-users] Problem with restarting pvfs on a cluster
Rob Ross
rross at mcs.anl.gov
Wed Oct 10 08:28:46 EDT 2007
Hi,
The "-f" option will indeed destroy things on that node. You should not
do that.
Rob
Raimondo Giammanco wrote:
> Hello Mr. Lang,
>
> As far as I understand, on the master /pvfs2-storage-space is
> not a mount point. /etc/fstab has no mention of it,
> and the directory it contains (744468fe) has a timestamp
> that is relative to the day we had to shutdown the master, so
> I cannot think that there was something mounted there..
>
> So, I am fairly certain /pvfs2-storage-space on the master was
> related to the metadata, but it is empty.
>
> If I were to initialize it with the -f option, would after reconstruct
> the data from the
> IO nodes, were all seems correct and the pvfs2-server process started
> correctly?
>
> This seems rather risky to me.
>
> Thanks for your help.
>
> Raimondo
>
>
> Sam Lang wrote:
>>
>> On Oct 9, 2007, at 12:57 PM, Raimondo Giammanco wrote:
>>
>>> Hello Mr. Lang,
>>>
>>> the master is a different unit type, different from the nodes that are
>>> blades in a rack mounted cluster.
>>>
>>> The mount command provides on the master:
>>> ##################
>>> /dev/sda1 on / type ext3 (rw)
>>> none on /proc type proc (rw)
>>> none on /sys type sysfs (rw)
>>> none on /dev/pts type devpts (rw,gid=5,mode=620)
>>> usbfs on /proc/bus/usb type usbfs (rw)
>>> none on /dev/shm type tmpfs (rw)
>>> none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
>>> sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
>>> nfsd on /proc/fs/nfsd type nfsd (rw)
>>> ##################
>>>
>>>
>>> while on the node it is
>>> ##################
>>> /dev/ram0 on / type ext2 (rw)
>>> none on /proc type proc (rw)
>>> none on /sys type sysfs (rw)
>>> none on /dev/pts type devpts (rw,gid=5,mode=620)
>>> usbfs on /proc/bus/usb type usbfs (rw)
>>> none on /dev/shm type tmpfs (rw)
>>> /dev/md0 on /tmp type ext3 (rw)
>>> /dev/md1 on /pvfs2-storage-space type ext3 (rw)
>>> none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
>>> sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
>>> 10.0.0.254:/home on /home type nfs (rw,addr=10.0.0.254)
>>> 10.0.0.254:/usr on /usr type nfs (rw,addr=10.0.0.254)
>>> 10.0.0.254:/opt on /opt type nfs (rw,addr=10.0.0.254)
>>> nfsd on /proc/fs/nfsd type nfsd (rw)
>>> #####################
>>>
>>>
>>> The difference is, I believe, that the master has a hardware raid,
>>
>> Is the hardware raid /dev/sda1 mounted to / ? If not, maybe the
>> hardware raid on the master needs to be mounted to /pvfs2-storage-space?
>>
>>> while the nodes have 2 small hd in software raid for the system and
>>> temporary data, and 2 big ones, still in software raid, for pvfs.
>>
>> Ok that explains the lost+found. FYI, while the /pvfs2-storage-space
>> may exist as a directory in /, it can also be a mountpoint for
>> something else, so its contents may not be visible (at least the
>> contents you would expect) if you haven't mounted everything properly.
>>
>> -sam
>>
>>>
>>> Regards,
>>> Raimondo
>>>
>>>
>>>>
>>>> On Oct 9, 2007, at 9:40 AM, Giammanco Raimondo wrote:
>>>>
>>>>> Hello Mr. Ross,
>>>>>
>>>>> thanks for your prompt reply.
>>>>>
>>>>> I believe the config file you mention is (for my case) /etc/pvfs2-
>>>>> server.conf-master-pvfs.
>>>>> its contents are:
>>>>> ############################
>>>>> StorageSpace /pvfs2-storage-space
>>>>> HostID "tcp://master-pvfs:3334"
>>>>> LogFile /tmp/pvfs2-server.log
>>>>> ############################
>>>>>
>>>>> The config file for a node, /etc/pvfs2-server.conf-node1-pvfs for
>>>>> example, is the following:
>>>>> ############################
>>>>> StorageSpace /pvfs2-storage-space
>>>>> HostID "tcp://node1-pvfs:3334"
>>>>> LogFile /tmp/pvfs2-server.log
>>>>> ############################
>>>>>
>>>>> Now, this /pvfs2-storage-space is unfortunately directly on the /,
>>>>> so the wrong
>>>>> mount timing theory is unfortunately to discard.
>>>>
>>>> In the directory listing you gave us for node1 /pvfs2-storage-space,
>>>> there's a lost+found directory. That only appears if you've mounted
>>>> another volume into that directory. My guess is that for the master
>>>> node, you've managed to somehow create part of the storage space
>>>> before mounting something to /pvfs2-storage-space, and the rest was
>>>> created after. You're only seeing what was created before the
>>>> mount. That's just a guess though. Can you send us the output of
>>>> 'mount' on node1 and master?
>>>>
>>>> -sam
>>>>
>>>>>
>>>>> On the nodes instead /pvfs2-storage-space it is on a mounted
>>>>> filesystem, /dev/md1,
>>>>> but there all goes apparently right, so it seems to me that really
>>>>> there is a problem
>>>>> with the master node and metadata server.
>>>>>
>>>>> The suggestion given by the log of pvfs2-server binary of using the
>>>>> -f option looks
>>>>> very dangerous to me, or in case of the metadata server it is ok,
>>>>> in the sense that
>>>>> it will reconstruct the data from the IO nodes? I cannot understand
>>>>> why
>>>>> the different storages have the same directory in common "744468fe",
>>>>> but the master has nothing else beside this empty directory.
>>>>>
>>>>> Even if the pvfs2-server process had been killed in a not clean way
>>>>> on the master and metadata server,
>>>>> it would not have been able (I assume) to delete data on the
>>>>> storage directory...
>>>>>
>>>>> So this absence of data in /pvfs2-storage-space for the metadata
>>>>> server is both disconcerting and confusing...
>>>>>
>>>>> Hope this mail will help us to proceed further.
>>>>>
>>>>> Best Regards
>>>>> Raimondo
>>>>>
>>>>> Rob Ross wrote:
>>>>>> Hi Raimondo,
>>>>>>
>>>>>> Two things. One, there is a second config file around that
>>>>>> specifies the storage directory etc. You should be able to find it
>>>>>> in /etc/ also. Please send that to us.
>>>>>>
>>>>>> An idea is that perhaps /pvfs2-storage-space is a mounted file
>>>>>> system, and that somehow it is getting mounted *after* the server
>>>>>> is started? Just a blind guess. If you try to start the service
>>>>>> after the system has finished booting, does it do the same thing?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Rob
>>>>>>
>>>>>> Raimondo Giammanco wrote:
>>>>>>> Hello, there.
>>>>>>>
>>>>>>> I am coming here seeking words of wisdom. I have looked the
>>>>>>> interweb and
>>>>>>> this list but I cannot seem to find useful informations, so I
>>>>>>> post here.
>>>>>>> I apologize if the answer to the question has already been
>>>>>>> provided and I
>>>>>>> could not find it.
>>>>>>>
>>>>>>> I have a problem with a pvfs2 installation that has been set-up
>>>>>>> by a third
>>>>>>> person. The cluster has been shutdown cleanly for a scheduled
>>>>>>> maintenance
>>>>>>> on the power lines, and I cannot bring pvfs2 up again.
>>>>>>>
>>>>>>> Here is the description.
>>>>>>>
>>>>>>> There is a cluster using a fronted and 9 nodes.
>>>>>>>
>>>>>>> As far as I understand, the fronted is a metadata server, and the
>>>>>>> nodes
>>>>>>> are IO servers, as for the /etc/pvfs2-fs.conf file I present here
>>>>>>> below:
>>>>>>>
>>>>>>> ####################
>>>>>>> <Defaults>
>>>>>>> UnexpectedRequests 50
>>>>>>> EventLogging none
>>>>>>> LogStamp datetime
>>>>>>> BMIModules bmi_tcp
>>>>>>> FlowModules flowproto_multiqueue
>>>>>>> PerfUpdateInterval 1000
>>>>>>> ServerJobBMITimeoutSecs 30
>>>>>>> ServerJobFlowTimeoutSecs 30
>>>>>>> ClientJobBMITimeoutSecs 300
>>>>>>> ClientJobFlowTimeoutSecs 300
>>>>>>> ClientRetryLimit 5
>>>>>>> ClientRetryDelayMilliSecs 2000
>>>>>>> </Defaults>
>>>>>>>
>>>>>>> <Aliases>
>>>>>>> Alias master-pvfs tcp://master-pvfs:3334
>>>>>>> Alias node1-pvfs tcp://node1-pvfs:3334
>>>>>>> Alias node2-pvfs tcp://node2-pvfs:3334
>>>>>>> Alias node3-pvfs tcp://node3-pvfs:3334
>>>>>>> Alias node4-pvfs tcp://node4-pvfs:3334
>>>>>>> Alias node5-pvfs tcp://node5-pvfs:3334
>>>>>>> Alias node6-pvfs tcp://node6-pvfs:3334
>>>>>>> Alias node7-pvfs tcp://node7-pvfs:3334
>>>>>>> Alias node8-pvfs tcp://node8-pvfs:3334
>>>>>>> Alias node9-pvfs tcp://node9-pvfs:3334
>>>>>>> </Aliases>
>>>>>>>
>>>>>>> <Filesystem>
>>>>>>> Name pvfs2-fs
>>>>>>> ID 1950640382
>>>>>>> RootHandle 1048576
>>>>>>> <MetaHandleRanges>
>>>>>>> Range master-pvfs 4-429496732
>>>>>>> </MetaHandleRanges>
>>>>>>> <DataHandleRanges>
>>>>>>> Range node1-pvfs 429496733-858993461
>>>>>>> Range node2-pvfs 858993462-1288490190
>>>>>>> Range node3-pvfs 1288490191-1717986919
>>>>>>> Range node4-pvfs 1717986920-2147483648
>>>>>>> Range node5-pvfs 2147483649-2576980377
>>>>>>> Range node6-pvfs 2576980378-3006477106
>>>>>>> Range node7-pvfs 3006477107-3435973835
>>>>>>> Range node8-pvfs 3435973836-3865470564
>>>>>>> Range node9-pvfs 3865470565-4294967293
>>>>>>> </DataHandleRanges>
>>>>>>> <StorageHints>
>>>>>>> TroveSyncMeta yes
>>>>>>> TroveSyncData no
>>>>>>> </StorageHints>
>>>>>>> </Filesystem>
>>>>>>> ####################
>>>>>>>
>>>>>>> The nodes are apparently working correctly, at boot the /etc/
>>>>>>> init.d/pvfs2
>>>>>>> script worked and the log file (/tmp/pvfs2-server.log) gives me
>>>>>>> for a
>>>>>>> node:
>>>>>>> ####################
>>>>>>> [D 10/08 14:39] PVFS2 Server version 2.6.2 starting.
>>>>>>> ####################
>>>>>>>
>>>>>>> on the master instead, it gives
>>>>>>> ####################
>>>>>>> [D 10/09 11:09] PVFS2 Server version 2.6.2 starting.
>>>>>>> [E 10/09 11:09] Error: trove_initialize: No such file or directory
>>>>>>> [E 10/09 11:09]
>>>>>>> ***********************************************
>>>>>>> [E 10/09 11:09] Invalid Storage Space: /pvfs2-storage-space
>>>>>>>
>>>>>>> [E 10/09 11:09] Storage initialization failed. The most common
>>>>>>> reason
>>>>>>> for this is that the storage space has not yet been
>>>>>>> created or is located on a partition that has not yet
>>>>>>> been mounted. If you'd like to create the storage space,
>>>>>>> re-run this program with a -f option.
>>>>>>> [E 10/09 11:09]
>>>>>>> ***********************************************
>>>>>>> [E 10/09 11:09] Error: Could not initialize server interfaces;
>>>>>>> aborting.
>>>>>>> [E 10/09 11:09] Error: Could not initialize server; aborting.
>>>>>>> ####################
>>>>>>>
>>>>>>> Now, the storage space on the nodes is full:
>>>>>>> ####################
>>>>>>> [root at node1 ~]# ls /pvfs2-storage-space/
>>>>>>> 744468fe collections.db lost+found storage_attributes.db
>>>>>>> ####################
>>>>>>> on the master (frontend) not:
>>>>>>> ####################
>>>>>>> [root at master ~]# ls /pvfs2-storage-space/
>>>>>>> 744468fe
>>>>>>> ####################
>>>>>>>
>>>>>>> Anyone can point me in the right direction?
>>>>>>>
>>>>>>> Thanks Again
>>>>>>>
>>>>>>> Raimondo
>>>>>>> _______________________________________________
>>>>>>> Pvfs2-users mailing list
>>>>>>> Pvfs2-users at beowulf-underground.org
>>>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>>>>>
>>>>>
>>>>> <giamma.vcf>
>>>>> _______________________________________________
>>>>> Pvfs2-users mailing list
>>>>> Pvfs2-users at beowulf-underground.org
>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>>
>>>
>>>
>
>
More information about the Pvfs2-users
mailing list