[Pvfs2-users] Problem with restarting pvfs on a cluster
Sam Lang
slang at mcs.anl.gov
Tue Oct 9 14:45:58 EDT 2007
On Oct 9, 2007, at 12:57 PM, Raimondo Giammanco wrote:
> Hello Mr. Lang,
>
> the master is a different unit type, different from the nodes that
> are
> blades in a rack mounted cluster.
>
> The mount command provides on the master:
> ##################
> /dev/sda1 on / type ext3 (rw)
> none on /proc type proc (rw)
> none on /sys type sysfs (rw)
> none on /dev/pts type devpts (rw,gid=5,mode=620)
> usbfs on /proc/bus/usb type usbfs (rw)
> none on /dev/shm type tmpfs (rw)
> none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
> sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
> nfsd on /proc/fs/nfsd type nfsd (rw)
> ##################
>
>
> while on the node it is
> ##################
> /dev/ram0 on / type ext2 (rw)
> none on /proc type proc (rw)
> none on /sys type sysfs (rw)
> none on /dev/pts type devpts (rw,gid=5,mode=620)
> usbfs on /proc/bus/usb type usbfs (rw)
> none on /dev/shm type tmpfs (rw)
> /dev/md0 on /tmp type ext3 (rw)
> /dev/md1 on /pvfs2-storage-space type ext3 (rw)
> none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
> sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
> 10.0.0.254:/home on /home type nfs (rw,addr=10.0.0.254)
> 10.0.0.254:/usr on /usr type nfs (rw,addr=10.0.0.254)
> 10.0.0.254:/opt on /opt type nfs (rw,addr=10.0.0.254)
> nfsd on /proc/fs/nfsd type nfsd (rw)
> #####################
>
>
> The difference is, I believe, that the master has a hardware raid,
Is the hardware raid /dev/sda1 mounted to / ? If not, maybe the
hardware raid on the master needs to be mounted to /pvfs2-storage-space?
> while the nodes have 2 small hd in software raid for the system and
> temporary data, and 2 big ones, still in software raid, for pvfs.
Ok that explains the lost+found. FYI, while the /pvfs2-storage-space
may exist as a directory in /, it can also be a mountpoint for
something else, so its contents may not be visible (at least the
contents you would expect) if you haven't mounted everything properly.
-sam
>
> Regards,
> Raimondo
>
>
>>
>> On Oct 9, 2007, at 9:40 AM, Giammanco Raimondo wrote:
>>
>>> Hello Mr. Ross,
>>>
>>> thanks for your prompt reply.
>>>
>>> I believe the config file you mention is (for my case) /etc/pvfs2-
>>> server.conf-master-pvfs.
>>> its contents are:
>>> ############################
>>> StorageSpace /pvfs2-storage-space
>>> HostID "tcp://master-pvfs:3334"
>>> LogFile /tmp/pvfs2-server.log
>>> ############################
>>>
>>> The config file for a node, /etc/pvfs2-server.conf-node1-pvfs for
>>> example, is the following:
>>> ############################
>>> StorageSpace /pvfs2-storage-space
>>> HostID "tcp://node1-pvfs:3334"
>>> LogFile /tmp/pvfs2-server.log
>>> ############################
>>>
>>> Now, this /pvfs2-storage-space is unfortunately directly on the /,
>>> so the wrong
>>> mount timing theory is unfortunately to discard.
>>
>> In the directory listing you gave us for node1 /pvfs2-storage-space,
>> there's a lost+found directory. That only appears if you've mounted
>> another volume into that directory. My guess is that for the master
>> node, you've managed to somehow create part of the storage space
>> before mounting something to /pvfs2-storage-space, and the rest was
>> created after. You're only seeing what was created before the
>> mount. That's just a guess though. Can you send us the output of
>> 'mount' on node1 and master?
>>
>> -sam
>>
>>>
>>> On the nodes instead /pvfs2-storage-space it is on a mounted
>>> filesystem, /dev/md1,
>>> but there all goes apparently right, so it seems to me that really
>>> there is a problem
>>> with the master node and metadata server.
>>>
>>> The suggestion given by the log of pvfs2-server binary of using the
>>> -f option looks
>>> very dangerous to me, or in case of the metadata server it is ok,
>>> in the sense that
>>> it will reconstruct the data from the IO nodes? I cannot understand
>>> why
>>> the different storages have the same directory in common "744468fe",
>>> but the master has nothing else beside this empty directory.
>>>
>>> Even if the pvfs2-server process had been killed in a not clean way
>>> on the master and metadata server,
>>> it would not have been able (I assume) to delete data on the
>>> storage directory...
>>>
>>> So this absence of data in /pvfs2-storage-space for the metadata
>>> server is both disconcerting and confusing...
>>>
>>> Hope this mail will help us to proceed further.
>>>
>>> Best Regards
>>> Raimondo
>>>
>>> Rob Ross wrote:
>>>> Hi Raimondo,
>>>>
>>>> Two things. One, there is a second config file around that
>>>> specifies the storage directory etc. You should be able to find it
>>>> in /etc/ also. Please send that to us.
>>>>
>>>> An idea is that perhaps /pvfs2-storage-space is a mounted file
>>>> system, and that somehow it is getting mounted *after* the server
>>>> is started? Just a blind guess. If you try to start the service
>>>> after the system has finished booting, does it do the same thing?
>>>>
>>>> Thanks,
>>>>
>>>> Rob
>>>>
>>>> Raimondo Giammanco wrote:
>>>>> Hello, there.
>>>>>
>>>>> I am coming here seeking words of wisdom. I have looked the
>>>>> interweb and
>>>>> this list but I cannot seem to find useful informations, so I
>>>>> post here.
>>>>> I apologize if the answer to the question has already been
>>>>> provided and I
>>>>> could not find it.
>>>>>
>>>>> I have a problem with a pvfs2 installation that has been set-up
>>>>> by a third
>>>>> person. The cluster has been shutdown cleanly for a scheduled
>>>>> maintenance
>>>>> on the power lines, and I cannot bring pvfs2 up again.
>>>>>
>>>>> Here is the description.
>>>>>
>>>>> There is a cluster using a fronted and 9 nodes.
>>>>>
>>>>> As far as I understand, the fronted is a metadata server, and the
>>>>> nodes
>>>>> are IO servers, as for the /etc/pvfs2-fs.conf file I present here
>>>>> below:
>>>>>
>>>>> ####################
>>>>> <Defaults>
>>>>> UnexpectedRequests 50
>>>>> EventLogging none
>>>>> LogStamp datetime
>>>>> BMIModules bmi_tcp
>>>>> FlowModules flowproto_multiqueue
>>>>> PerfUpdateInterval 1000
>>>>> ServerJobBMITimeoutSecs 30
>>>>> ServerJobFlowTimeoutSecs 30
>>>>> ClientJobBMITimeoutSecs 300
>>>>> ClientJobFlowTimeoutSecs 300
>>>>> ClientRetryLimit 5
>>>>> ClientRetryDelayMilliSecs 2000
>>>>> </Defaults>
>>>>>
>>>>> <Aliases>
>>>>> Alias master-pvfs tcp://master-pvfs:3334
>>>>> Alias node1-pvfs tcp://node1-pvfs:3334
>>>>> Alias node2-pvfs tcp://node2-pvfs:3334
>>>>> Alias node3-pvfs tcp://node3-pvfs:3334
>>>>> Alias node4-pvfs tcp://node4-pvfs:3334
>>>>> Alias node5-pvfs tcp://node5-pvfs:3334
>>>>> Alias node6-pvfs tcp://node6-pvfs:3334
>>>>> Alias node7-pvfs tcp://node7-pvfs:3334
>>>>> Alias node8-pvfs tcp://node8-pvfs:3334
>>>>> Alias node9-pvfs tcp://node9-pvfs:3334
>>>>> </Aliases>
>>>>>
>>>>> <Filesystem>
>>>>> Name pvfs2-fs
>>>>> ID 1950640382
>>>>> RootHandle 1048576
>>>>> <MetaHandleRanges>
>>>>> Range master-pvfs 4-429496732
>>>>> </MetaHandleRanges>
>>>>> <DataHandleRanges>
>>>>> Range node1-pvfs 429496733-858993461
>>>>> Range node2-pvfs 858993462-1288490190
>>>>> Range node3-pvfs 1288490191-1717986919
>>>>> Range node4-pvfs 1717986920-2147483648
>>>>> Range node5-pvfs 2147483649-2576980377
>>>>> Range node6-pvfs 2576980378-3006477106
>>>>> Range node7-pvfs 3006477107-3435973835
>>>>> Range node8-pvfs 3435973836-3865470564
>>>>> Range node9-pvfs 3865470565-4294967293
>>>>> </DataHandleRanges>
>>>>> <StorageHints>
>>>>> TroveSyncMeta yes
>>>>> TroveSyncData no
>>>>> </StorageHints>
>>>>> </Filesystem>
>>>>> ####################
>>>>>
>>>>> The nodes are apparently working correctly, at boot the /etc/
>>>>> init.d/pvfs2
>>>>> script worked and the log file (/tmp/pvfs2-server.log) gives me
>>>>> for a
>>>>> node:
>>>>> ####################
>>>>> [D 10/08 14:39] PVFS2 Server version 2.6.2 starting.
>>>>> ####################
>>>>>
>>>>> on the master instead, it gives
>>>>> ####################
>>>>> [D 10/09 11:09] PVFS2 Server version 2.6.2 starting.
>>>>> [E 10/09 11:09] Error: trove_initialize: No such file or directory
>>>>> [E 10/09 11:09]
>>>>> ***********************************************
>>>>> [E 10/09 11:09] Invalid Storage Space: /pvfs2-storage-space
>>>>>
>>>>> [E 10/09 11:09] Storage initialization failed. The most common
>>>>> reason
>>>>> for this is that the storage space has not yet been
>>>>> created or is located on a partition that has not yet
>>>>> been mounted. If you'd like to create the storage space,
>>>>> re-run this program with a -f option.
>>>>> [E 10/09 11:09]
>>>>> ***********************************************
>>>>> [E 10/09 11:09] Error: Could not initialize server interfaces;
>>>>> aborting.
>>>>> [E 10/09 11:09] Error: Could not initialize server; aborting.
>>>>> ####################
>>>>>
>>>>> Now, the storage space on the nodes is full:
>>>>> ####################
>>>>> [root at node1 ~]# ls /pvfs2-storage-space/
>>>>> 744468fe collections.db lost+found storage_attributes.db
>>>>> ####################
>>>>> on the master (frontend) not:
>>>>> ####################
>>>>> [root at master ~]# ls /pvfs2-storage-space/
>>>>> 744468fe
>>>>> ####################
>>>>>
>>>>> Anyone can point me in the right direction?
>>>>>
>>>>> Thanks Again
>>>>>
>>>>> Raimondo
>>>>> _______________________________________________
>>>>> Pvfs2-users mailing list
>>>>> Pvfs2-users at beowulf-underground.org
>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>>>
>>>
>>> <giamma.vcf>
>>> _______________________________________________
>>> Pvfs2-users mailing list
>>> Pvfs2-users at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>
>
More information about the Pvfs2-users
mailing list