[Pvfs2-users] Problem with restarting pvfs on a cluster

Sam Lang slang at mcs.anl.gov
Tue Oct 9 14:45:58 EDT 2007


On Oct 9, 2007, at 12:57 PM, Raimondo Giammanco wrote:

> Hello Mr. Lang,
>
>  the master is a different unit type, different from the nodes that  
> are
> blades in a rack mounted cluster.
>
> The mount command provides on the master:
> ##################
> /dev/sda1 on / type ext3 (rw)
> none on /proc type proc (rw)
> none on /sys type sysfs (rw)
> none on /dev/pts type devpts (rw,gid=5,mode=620)
> usbfs on /proc/bus/usb type usbfs (rw)
> none on /dev/shm type tmpfs (rw)
> none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
> sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
> nfsd on /proc/fs/nfsd type nfsd (rw)
> ##################
>
>
> while on the node it is
> ##################
> /dev/ram0 on / type ext2 (rw)
> none on /proc type proc (rw)
> none on /sys type sysfs (rw)
> none on /dev/pts type devpts (rw,gid=5,mode=620)
> usbfs on /proc/bus/usb type usbfs (rw)
> none on /dev/shm type tmpfs (rw)
> /dev/md0 on /tmp type ext3 (rw)
> /dev/md1 on /pvfs2-storage-space type ext3 (rw)
> none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
> sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
> 10.0.0.254:/home on /home type nfs (rw,addr=10.0.0.254)
> 10.0.0.254:/usr on /usr type nfs (rw,addr=10.0.0.254)
> 10.0.0.254:/opt on /opt type nfs (rw,addr=10.0.0.254)
> nfsd on /proc/fs/nfsd type nfsd (rw)
> #####################
>
>
> The difference is, I believe, that the master has a hardware raid,

Is the hardware raid /dev/sda1 mounted to / ?  If not, maybe the  
hardware raid on the master needs to be mounted to /pvfs2-storage-space?

> while the nodes have 2 small hd in software raid for the system and
> temporary data, and 2 big ones, still in software raid, for pvfs.

Ok that explains the lost+found.  FYI, while the /pvfs2-storage-space  
may exist as a directory in /, it can also be a mountpoint for  
something else, so its contents may not be visible (at least the  
contents you would expect) if you haven't mounted everything properly.

-sam

>
> Regards,
> Raimondo
>
>
>>
>> On Oct 9, 2007, at 9:40 AM, Giammanco Raimondo wrote:
>>
>>> Hello Mr. Ross,
>>>
>>> thanks for your prompt reply.
>>>
>>> I believe the config file you mention is (for my case)  /etc/pvfs2-
>>> server.conf-master-pvfs.
>>> its contents are:
>>> ############################
>>> StorageSpace /pvfs2-storage-space
>>> HostID "tcp://master-pvfs:3334"
>>> LogFile /tmp/pvfs2-server.log
>>> ############################
>>>
>>> The config file for a node, /etc/pvfs2-server.conf-node1-pvfs for
>>> example, is the following:
>>> ############################
>>> StorageSpace /pvfs2-storage-space
>>> HostID "tcp://node1-pvfs:3334"
>>> LogFile /tmp/pvfs2-server.log
>>> ############################
>>>
>>> Now, this /pvfs2-storage-space is unfortunately directly on the /,
>>> so the wrong
>>> mount timing theory is unfortunately to discard.
>>
>> In the directory listing you gave us for node1 /pvfs2-storage-space,
>> there's a lost+found directory.  That only appears if you've mounted
>> another volume into that directory.  My guess is that for the master
>> node, you've managed to somehow create part of the storage space
>> before mounting something to /pvfs2-storage-space, and the rest was
>> created after.  You're only seeing what was created before the
>> mount.  That's just a guess though.  Can you send us the output of
>> 'mount' on node1 and master?
>>
>> -sam
>>
>>>
>>> On the nodes instead /pvfs2-storage-space it is on a mounted
>>> filesystem, /dev/md1,
>>> but there all goes apparently right, so it seems to me that really
>>> there is a problem
>>> with the master node and metadata server.
>>>
>>> The suggestion given by the log of pvfs2-server binary of using the
>>> -f option looks
>>> very dangerous to me, or in case of the metadata server it is ok,
>>> in the sense that
>>> it will reconstruct the data from the IO nodes? I cannot understand
>>> why
>>> the different storages have the same directory in common "744468fe",
>>> but the master has nothing else beside this empty directory.
>>>
>>> Even if the pvfs2-server process had been killed in a not clean way
>>> on the master and metadata server,
>>> it would not have been able (I assume) to delete data on the
>>> storage directory...
>>>
>>> So this absence of data in  /pvfs2-storage-space for the metadata
>>> server is both disconcerting and confusing...
>>>
>>> Hope this mail will help us to proceed further.
>>>
>>> Best Regards
>>> Raimondo
>>>
>>> Rob Ross wrote:
>>>> Hi Raimondo,
>>>>
>>>> Two things. One, there is a second config file around that
>>>> specifies the storage directory etc. You should be able to find it
>>>> in /etc/ also. Please send that to us.
>>>>
>>>> An idea is that perhaps /pvfs2-storage-space is a mounted file
>>>> system, and that somehow it is getting mounted *after* the server
>>>> is started? Just a blind guess. If you try to start the service
>>>> after the system has finished booting, does it do the same thing?
>>>>
>>>> Thanks,
>>>>
>>>> Rob
>>>>
>>>> Raimondo Giammanco wrote:
>>>>> Hello, there.
>>>>>
>>>>>  I am coming here seeking words of wisdom. I have looked the
>>>>> interweb and
>>>>> this list but I cannot seem to find useful informations, so I
>>>>> post here.
>>>>> I apologize if the answer to the question has already been
>>>>> provided and I
>>>>> could not find it.
>>>>>
>>>>> I have a problem with a pvfs2 installation that has been set-up
>>>>> by a third
>>>>> person. The cluster has been shutdown cleanly for a scheduled
>>>>> maintenance
>>>>> on the power lines, and I cannot bring pvfs2 up again.
>>>>>
>>>>> Here is the description.
>>>>>
>>>>> There is a cluster using a fronted and 9 nodes.
>>>>>
>>>>> As far as I understand, the fronted is a metadata server, and the
>>>>> nodes
>>>>> are IO servers, as for the /etc/pvfs2-fs.conf file I present here
>>>>> below:
>>>>>
>>>>> ####################
>>>>> <Defaults>
>>>>>         UnexpectedRequests 50
>>>>>         EventLogging none
>>>>>         LogStamp datetime
>>>>>         BMIModules bmi_tcp
>>>>>         FlowModules flowproto_multiqueue
>>>>>         PerfUpdateInterval 1000
>>>>>         ServerJobBMITimeoutSecs 30
>>>>>         ServerJobFlowTimeoutSecs 30
>>>>>         ClientJobBMITimeoutSecs 300
>>>>>         ClientJobFlowTimeoutSecs 300
>>>>>         ClientRetryLimit 5
>>>>>         ClientRetryDelayMilliSecs 2000
>>>>> </Defaults>
>>>>>
>>>>> <Aliases>
>>>>>         Alias master-pvfs tcp://master-pvfs:3334
>>>>>         Alias node1-pvfs tcp://node1-pvfs:3334
>>>>>         Alias node2-pvfs tcp://node2-pvfs:3334
>>>>>         Alias node3-pvfs tcp://node3-pvfs:3334
>>>>>         Alias node4-pvfs tcp://node4-pvfs:3334
>>>>>         Alias node5-pvfs tcp://node5-pvfs:3334
>>>>>         Alias node6-pvfs tcp://node6-pvfs:3334
>>>>>         Alias node7-pvfs tcp://node7-pvfs:3334
>>>>>         Alias node8-pvfs tcp://node8-pvfs:3334
>>>>>         Alias node9-pvfs tcp://node9-pvfs:3334
>>>>> </Aliases>
>>>>>
>>>>> <Filesystem>
>>>>>         Name pvfs2-fs
>>>>>         ID 1950640382
>>>>>         RootHandle 1048576
>>>>>         <MetaHandleRanges>
>>>>>                 Range master-pvfs 4-429496732
>>>>>         </MetaHandleRanges>
>>>>>         <DataHandleRanges>
>>>>>                 Range node1-pvfs 429496733-858993461
>>>>>                 Range node2-pvfs 858993462-1288490190
>>>>>                 Range node3-pvfs 1288490191-1717986919
>>>>>                 Range node4-pvfs 1717986920-2147483648
>>>>>                 Range node5-pvfs 2147483649-2576980377
>>>>>                 Range node6-pvfs 2576980378-3006477106
>>>>>                 Range node7-pvfs 3006477107-3435973835
>>>>>                 Range node8-pvfs 3435973836-3865470564
>>>>>                 Range node9-pvfs 3865470565-4294967293
>>>>>         </DataHandleRanges>
>>>>>         <StorageHints>
>>>>>                 TroveSyncMeta yes
>>>>>                 TroveSyncData no
>>>>>         </StorageHints>
>>>>> </Filesystem>
>>>>> ####################
>>>>>
>>>>> The nodes are apparently working correctly, at boot the /etc/
>>>>> init.d/pvfs2
>>>>> script worked and the log file (/tmp/pvfs2-server.log) gives me
>>>>> for a
>>>>> node:
>>>>> ####################
>>>>> [D 10/08 14:39] PVFS2 Server version 2.6.2 starting.
>>>>> ####################
>>>>>
>>>>> on the master instead, it gives
>>>>> ####################
>>>>> [D 10/09 11:09] PVFS2 Server version 2.6.2 starting.
>>>>> [E 10/09 11:09] Error: trove_initialize: No such file or directory
>>>>> [E 10/09 11:09]
>>>>> ***********************************************
>>>>> [E 10/09 11:09] Invalid Storage Space: /pvfs2-storage-space
>>>>>
>>>>> [E 10/09 11:09] Storage initialization failed.  The most common
>>>>> reason
>>>>> for this is that the storage space has not yet been
>>>>> created or is located on a partition that has not yet
>>>>> been mounted.  If you'd like to create the storage space,
>>>>> re-run this program with a -f option.
>>>>> [E 10/09 11:09]
>>>>> ***********************************************
>>>>> [E 10/09 11:09] Error: Could not initialize server interfaces;
>>>>> aborting.
>>>>> [E 10/09 11:09] Error: Could not initialize server; aborting.
>>>>> ####################
>>>>>
>>>>> Now, the storage space on the nodes is full:
>>>>> ####################
>>>>> [root at node1 ~]# ls /pvfs2-storage-space/
>>>>> 744468fe  collections.db  lost+found  storage_attributes.db
>>>>> ####################
>>>>> on the master (frontend) not:
>>>>> ####################
>>>>> [root at master ~]# ls /pvfs2-storage-space/
>>>>> 744468fe
>>>>> ####################
>>>>>
>>>>> Anyone can point me in the right direction?
>>>>>
>>>>> Thanks Again
>>>>>
>>>>> Raimondo
>>>>> _______________________________________________
>>>>> Pvfs2-users mailing list
>>>>> Pvfs2-users at beowulf-underground.org
>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>>>
>>>
>>> <giamma.vcf>
>>> _______________________________________________
>>> Pvfs2-users mailing list
>>> Pvfs2-users at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>
>



More information about the Pvfs2-users mailing list