[Pvfs2-users] Problem with restarting pvfs on a cluster
Raimondo Giammanco
giamma at vki.ac.be
Wed Oct 10 04:51:45 EDT 2007
Hello Mr. Lang,
As far as I understand, on the master /pvfs2-storage-space is
not a mount point. /etc/fstab has no mention of it,
and the directory it contains (744468fe) has a timestamp
that is relative to the day we had to shutdown the master, so
I cannot think that there was something mounted there..
So, I am fairly certain /pvfs2-storage-space on the master was
related to the metadata, but it is empty.
If I were to initialize it with the -f option, would after reconstruct
the data from the
IO nodes, were all seems correct and the pvfs2-server process started
correctly?
This seems rather risky to me.
Thanks for your help.
Raimondo
Sam Lang wrote:
>
> On Oct 9, 2007, at 12:57 PM, Raimondo Giammanco wrote:
>
>> Hello Mr. Lang,
>>
>> the master is a different unit type, different from the nodes that are
>> blades in a rack mounted cluster.
>>
>> The mount command provides on the master:
>> ##################
>> /dev/sda1 on / type ext3 (rw)
>> none on /proc type proc (rw)
>> none on /sys type sysfs (rw)
>> none on /dev/pts type devpts (rw,gid=5,mode=620)
>> usbfs on /proc/bus/usb type usbfs (rw)
>> none on /dev/shm type tmpfs (rw)
>> none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
>> sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
>> nfsd on /proc/fs/nfsd type nfsd (rw)
>> ##################
>>
>>
>> while on the node it is
>> ##################
>> /dev/ram0 on / type ext2 (rw)
>> none on /proc type proc (rw)
>> none on /sys type sysfs (rw)
>> none on /dev/pts type devpts (rw,gid=5,mode=620)
>> usbfs on /proc/bus/usb type usbfs (rw)
>> none on /dev/shm type tmpfs (rw)
>> /dev/md0 on /tmp type ext3 (rw)
>> /dev/md1 on /pvfs2-storage-space type ext3 (rw)
>> none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
>> sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
>> 10.0.0.254:/home on /home type nfs (rw,addr=10.0.0.254)
>> 10.0.0.254:/usr on /usr type nfs (rw,addr=10.0.0.254)
>> 10.0.0.254:/opt on /opt type nfs (rw,addr=10.0.0.254)
>> nfsd on /proc/fs/nfsd type nfsd (rw)
>> #####################
>>
>>
>> The difference is, I believe, that the master has a hardware raid,
>
> Is the hardware raid /dev/sda1 mounted to / ? If not, maybe the
> hardware raid on the master needs to be mounted to /pvfs2-storage-space?
>
>> while the nodes have 2 small hd in software raid for the system and
>> temporary data, and 2 big ones, still in software raid, for pvfs.
>
> Ok that explains the lost+found. FYI, while the /pvfs2-storage-space
> may exist as a directory in /, it can also be a mountpoint for
> something else, so its contents may not be visible (at least the
> contents you would expect) if you haven't mounted everything properly.
>
> -sam
>
>>
>> Regards,
>> Raimondo
>>
>>
>>>
>>> On Oct 9, 2007, at 9:40 AM, Giammanco Raimondo wrote:
>>>
>>>> Hello Mr. Ross,
>>>>
>>>> thanks for your prompt reply.
>>>>
>>>> I believe the config file you mention is (for my case) /etc/pvfs2-
>>>> server.conf-master-pvfs.
>>>> its contents are:
>>>> ############################
>>>> StorageSpace /pvfs2-storage-space
>>>> HostID "tcp://master-pvfs:3334"
>>>> LogFile /tmp/pvfs2-server.log
>>>> ############################
>>>>
>>>> The config file for a node, /etc/pvfs2-server.conf-node1-pvfs for
>>>> example, is the following:
>>>> ############################
>>>> StorageSpace /pvfs2-storage-space
>>>> HostID "tcp://node1-pvfs:3334"
>>>> LogFile /tmp/pvfs2-server.log
>>>> ############################
>>>>
>>>> Now, this /pvfs2-storage-space is unfortunately directly on the /,
>>>> so the wrong
>>>> mount timing theory is unfortunately to discard.
>>>
>>> In the directory listing you gave us for node1 /pvfs2-storage-space,
>>> there's a lost+found directory. That only appears if you've mounted
>>> another volume into that directory. My guess is that for the master
>>> node, you've managed to somehow create part of the storage space
>>> before mounting something to /pvfs2-storage-space, and the rest was
>>> created after. You're only seeing what was created before the
>>> mount. That's just a guess though. Can you send us the output of
>>> 'mount' on node1 and master?
>>>
>>> -sam
>>>
>>>>
>>>> On the nodes instead /pvfs2-storage-space it is on a mounted
>>>> filesystem, /dev/md1,
>>>> but there all goes apparently right, so it seems to me that really
>>>> there is a problem
>>>> with the master node and metadata server.
>>>>
>>>> The suggestion given by the log of pvfs2-server binary of using the
>>>> -f option looks
>>>> very dangerous to me, or in case of the metadata server it is ok,
>>>> in the sense that
>>>> it will reconstruct the data from the IO nodes? I cannot understand
>>>> why
>>>> the different storages have the same directory in common "744468fe",
>>>> but the master has nothing else beside this empty directory.
>>>>
>>>> Even if the pvfs2-server process had been killed in a not clean way
>>>> on the master and metadata server,
>>>> it would not have been able (I assume) to delete data on the
>>>> storage directory...
>>>>
>>>> So this absence of data in /pvfs2-storage-space for the metadata
>>>> server is both disconcerting and confusing...
>>>>
>>>> Hope this mail will help us to proceed further.
>>>>
>>>> Best Regards
>>>> Raimondo
>>>>
>>>> Rob Ross wrote:
>>>>> Hi Raimondo,
>>>>>
>>>>> Two things. One, there is a second config file around that
>>>>> specifies the storage directory etc. You should be able to find it
>>>>> in /etc/ also. Please send that to us.
>>>>>
>>>>> An idea is that perhaps /pvfs2-storage-space is a mounted file
>>>>> system, and that somehow it is getting mounted *after* the server
>>>>> is started? Just a blind guess. If you try to start the service
>>>>> after the system has finished booting, does it do the same thing?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Rob
>>>>>
>>>>> Raimondo Giammanco wrote:
>>>>>> Hello, there.
>>>>>>
>>>>>> I am coming here seeking words of wisdom. I have looked the
>>>>>> interweb and
>>>>>> this list but I cannot seem to find useful informations, so I
>>>>>> post here.
>>>>>> I apologize if the answer to the question has already been
>>>>>> provided and I
>>>>>> could not find it.
>>>>>>
>>>>>> I have a problem with a pvfs2 installation that has been set-up
>>>>>> by a third
>>>>>> person. The cluster has been shutdown cleanly for a scheduled
>>>>>> maintenance
>>>>>> on the power lines, and I cannot bring pvfs2 up again.
>>>>>>
>>>>>> Here is the description.
>>>>>>
>>>>>> There is a cluster using a fronted and 9 nodes.
>>>>>>
>>>>>> As far as I understand, the fronted is a metadata server, and the
>>>>>> nodes
>>>>>> are IO servers, as for the /etc/pvfs2-fs.conf file I present here
>>>>>> below:
>>>>>>
>>>>>> ####################
>>>>>> <Defaults>
>>>>>> UnexpectedRequests 50
>>>>>> EventLogging none
>>>>>> LogStamp datetime
>>>>>> BMIModules bmi_tcp
>>>>>> FlowModules flowproto_multiqueue
>>>>>> PerfUpdateInterval 1000
>>>>>> ServerJobBMITimeoutSecs 30
>>>>>> ServerJobFlowTimeoutSecs 30
>>>>>> ClientJobBMITimeoutSecs 300
>>>>>> ClientJobFlowTimeoutSecs 300
>>>>>> ClientRetryLimit 5
>>>>>> ClientRetryDelayMilliSecs 2000
>>>>>> </Defaults>
>>>>>>
>>>>>> <Aliases>
>>>>>> Alias master-pvfs tcp://master-pvfs:3334
>>>>>> Alias node1-pvfs tcp://node1-pvfs:3334
>>>>>> Alias node2-pvfs tcp://node2-pvfs:3334
>>>>>> Alias node3-pvfs tcp://node3-pvfs:3334
>>>>>> Alias node4-pvfs tcp://node4-pvfs:3334
>>>>>> Alias node5-pvfs tcp://node5-pvfs:3334
>>>>>> Alias node6-pvfs tcp://node6-pvfs:3334
>>>>>> Alias node7-pvfs tcp://node7-pvfs:3334
>>>>>> Alias node8-pvfs tcp://node8-pvfs:3334
>>>>>> Alias node9-pvfs tcp://node9-pvfs:3334
>>>>>> </Aliases>
>>>>>>
>>>>>> <Filesystem>
>>>>>> Name pvfs2-fs
>>>>>> ID 1950640382
>>>>>> RootHandle 1048576
>>>>>> <MetaHandleRanges>
>>>>>> Range master-pvfs 4-429496732
>>>>>> </MetaHandleRanges>
>>>>>> <DataHandleRanges>
>>>>>> Range node1-pvfs 429496733-858993461
>>>>>> Range node2-pvfs 858993462-1288490190
>>>>>> Range node3-pvfs 1288490191-1717986919
>>>>>> Range node4-pvfs 1717986920-2147483648
>>>>>> Range node5-pvfs 2147483649-2576980377
>>>>>> Range node6-pvfs 2576980378-3006477106
>>>>>> Range node7-pvfs 3006477107-3435973835
>>>>>> Range node8-pvfs 3435973836-3865470564
>>>>>> Range node9-pvfs 3865470565-4294967293
>>>>>> </DataHandleRanges>
>>>>>> <StorageHints>
>>>>>> TroveSyncMeta yes
>>>>>> TroveSyncData no
>>>>>> </StorageHints>
>>>>>> </Filesystem>
>>>>>> ####################
>>>>>>
>>>>>> The nodes are apparently working correctly, at boot the /etc/
>>>>>> init.d/pvfs2
>>>>>> script worked and the log file (/tmp/pvfs2-server.log) gives me
>>>>>> for a
>>>>>> node:
>>>>>> ####################
>>>>>> [D 10/08 14:39] PVFS2 Server version 2.6.2 starting.
>>>>>> ####################
>>>>>>
>>>>>> on the master instead, it gives
>>>>>> ####################
>>>>>> [D 10/09 11:09] PVFS2 Server version 2.6.2 starting.
>>>>>> [E 10/09 11:09] Error: trove_initialize: No such file or directory
>>>>>> [E 10/09 11:09]
>>>>>> ***********************************************
>>>>>> [E 10/09 11:09] Invalid Storage Space: /pvfs2-storage-space
>>>>>>
>>>>>> [E 10/09 11:09] Storage initialization failed. The most common
>>>>>> reason
>>>>>> for this is that the storage space has not yet been
>>>>>> created or is located on a partition that has not yet
>>>>>> been mounted. If you'd like to create the storage space,
>>>>>> re-run this program with a -f option.
>>>>>> [E 10/09 11:09]
>>>>>> ***********************************************
>>>>>> [E 10/09 11:09] Error: Could not initialize server interfaces;
>>>>>> aborting.
>>>>>> [E 10/09 11:09] Error: Could not initialize server; aborting.
>>>>>> ####################
>>>>>>
>>>>>> Now, the storage space on the nodes is full:
>>>>>> ####################
>>>>>> [root at node1 ~]# ls /pvfs2-storage-space/
>>>>>> 744468fe collections.db lost+found storage_attributes.db
>>>>>> ####################
>>>>>> on the master (frontend) not:
>>>>>> ####################
>>>>>> [root at master ~]# ls /pvfs2-storage-space/
>>>>>> 744468fe
>>>>>> ####################
>>>>>>
>>>>>> Anyone can point me in the right direction?
>>>>>>
>>>>>> Thanks Again
>>>>>>
>>>>>> Raimondo
>>>>>> _______________________________________________
>>>>>> Pvfs2-users mailing list
>>>>>> Pvfs2-users at beowulf-underground.org
>>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>>>>
>>>>
>>>> <giamma.vcf>
>>>> _______________________________________________
>>>> Pvfs2-users mailing list
>>>> Pvfs2-users at beowulf-underground.org
>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>
>>
>>
More information about the Pvfs2-users
mailing list