[PVFS-users] pvfs mounting problems with minimalistic kernel
Rob Ross
rross at mcs.anl.gov
Sun May 16 15:16:33 EDT 2004
Hi Kaveh,
The problem is that your diskless nodes are missing /etc/protocols, I
think. Try using the "--enable-scyld" option to configure and rebuilding;
the fixes for them I think will address what you are seeing.
As an aside, the reason that you see the afs stuff is that /etc/services
maps port 7000 to afs. That's all that means.
Sorry this didn't occur to me sooner. Let me know how that works.
Rob
On Sun, 16 May 2004, Kaveh Moallemi wrote:
>
> Hi Rob,
>
> by running: pvfs-ping -h 10.0.0.1 -p 3000 -f /pvfs-meta
> I get get the following:
>
> Kernel does not support tcp: No such file or directory
> pvfs-ping: unable to connect to 10.0.0.1:3000.
> mgr (10.0.0.1:3000) is down.
> pvfs file system /pvfs-meta has issues.
>
> mgr-ping also gives similar results. And if I run tcpdump on the server, I
> just get:
>
> tcpdump: listening on eth0
>
> Nothing changes as the slave attepts to pvfs-ping or mount .. but, if I
> ping (standard ping) the server from the slave node, tcp dump gives the
> following:
>
> tcpdump: listening on eth0
> 11:14:50.013282 arp who-has node1 tell node2
> 11:14:50.013305 arp reply node1 is-at 0:50:ba:d2:44:ec
> 11:14:50.013372 node2 > node1: icmp: echo request (DF)
> 11:14:50.013417 node1 > node2: icmp: echo reply
> 11:14:51.013462 node2 > node1: icmp: echo request (DF)
> 11:14:51.013492 node1 > node2: icmp: echo reply
> 11:14:55.012741 arp who-has node2 tell node1
> 11:14:55.012808 arp reply node2 is-at 0:2:2a:b6:48:72
>
> Now with telnet from node2:
>
> telnet 10.0.0.1 3000
>
> gives the following on node1:
>
> 11:40:12.482060 node2.32771 > node1.3000: S 2713888958:2713888958(0) win 5840 <mss 1460,sackOK,timestamp 381265 0,nop,wscale 0> (DF)
> 11:40:12.482121 arp who-has node2 tell node1
> 11:40:12.482185 arp reply node2 is-at 0:2:2a:b6:48:72
> 11:40:12.482197 node1.3000 > node2.32771: S 3639396032:3639396032(0) ack 2713888959 win 5792 <mss 1460,sackOK,timestamp 42224644 381265,nop,wscale 0> (DF)
> 11:40:12.482278 node2.32771 > node1.3000: . ack 1 win 5840 <nop,nop,timestamp 381265 42224644> (DF)
> 11:40:13.471598 node2.32771 > node1.3000: P 1:2(1) ack 1 win 5840 <nop,nop,timestamp 381364 42224644> (DF)
> 11:40:13.471634 node1.3000 > node2.32771: . ack 2 win 5792 <nop,nop,timestamp 42224743 381364> (DF)
> 11:40:13.878878 node2.32771 > node1.3000: P 2:3(1) ack 1 win 5840 <nop,nop,timestamp 381405 42224743> (DF)
> 11:40:13.878909 node1.3000 > node2.32771: . ack 3 win 5792 <nop,nop,timestamp 42224784 381405> (DF)
> 11:40:14.243800 node2.32771 > node1.3000: P 3:4(1) ack 1 win 5840 <nop,nop,timestamp 381442 42224784> (DF)
> 11:40:14.243834 node1.3000 > node2.32771: . ack 4 win 5792 <nop,nop,timestamp 42224821 381442> (DF)
> 11:41:14.262860 node1.3000 > node2.32771: F 1:1(0) ack 4 win 5792 <nop,nop,timestamp 42230823 381442> (DF)
> 11:41:14.263114 node2.32771 > node1.3000: F 4:4(0) ack 2 win 5840 <nop,nop,timestamp 387443 42230823> (DF)
> 11:41:14.263143 node1.3000 > node2.32771: . ack 5 win 5792 <nop,nop,timestamp 42230823 387443> (DF)
>
> And, telnet 10.0.0.1 7000
> gives the following on node1:
>
> tcpdump: listening on eth0
> 11:33:16.819182 arp who-has node1 tell node2
> 11:33:16.819223 arp reply node1 is-at 0:50:ba:d2:44:ec
> 11:33:16.819290 node2.32769 > node1.afs3-fileserver: S 2284947310:2284947310(0) win 5840 <mss 1460,sackOK,timestamp 339704 0,nop,wscale 0> (DF)
> 11:33:16.819340 node1.afs3-fileserver > node2.32769: S 3204858363:3204858363(0) ack 2284947311 win 5792 <mss 1460,sackOK,timestamp 42183078 339704,nop,wscale 0> (DF)
> 11:33:16.819444 node2.32769 > node1.afs3-fileserver: . ack 1 win 5840 <nop,nop,timestamp 339704 42183078> (DF)
> 11:33:25.730181 node2.32769 > node1.afs3-fileserver: P 1:2(1) ack 1 win 5840 <nop,nop,timestamp 340595 42183078> (DF)
> 11:33:25.730226 node1.afs3-fileserver > node2.32769: . ack 2 win 5792 <nop,nop,timestamp 42183969 340595> (DF)
> 11:35:10.797082 node1.afs3-fileserver > node2.32769: F 1:1(0) ack 2 win 5792 <nop,nop,timestamp 42194476 340595> (DF)
> 11:35:10.797349 node2.32769 > node1.afs3-fileserver: F 2:2(0) ack 2 win 5840 <nop,nop,timestamp 351101 42194476> (DF)
> 11:35:10.797377 node1.afs3-fileserver > node2.32769: . ack 3 win 5792 <nop,nop,timestamp 42194476 351101> (DF)
>
> Interesting, why is port 7000 designated as "afs3-fileserver" (I don't
> have afs installed)?
>
>
> Thank you Rob,
>
> Kaveh
>
> >From: Rob Ross <rross at mcs.anl.gov>
> >To: Kaveh Moallemi <kmoallem at hotmail.com>
> >CC: pvfs-users at beowulf-underground.org
> >Subject: Re: [PVFS-users] pvfs mounting problems with minimalistic kernel
> >Date: Sat, 15 May 2004 12:29:49 -0500 (CDT)
> >
> >Hi Kaveh,
> >
> >[ Pinging has been verified to work. ]
> >
> >I would suggest that you try putting the pvfs-ping utility out on one of
> >the nodes that isn't working. You can then run it with:
> >
> > pvfs-ping -h 10.0.0.1 -p 3000 -f /pvfs-meta
> >
> >This will attempt to connect to the mgr (and if that succeeds, to the
> >iods). This program has somewhat more helpful error messages, so it might
> >help us figure out what is going on. It may also just print out "server
> >not responding" though; we'll see.
> >
> >Do you have tcpdump on the server, and are you familiar with its usage?
> >That might be a next step.
> >
> >Thanks,
> >
> >Rob
> >
>
>
More information about the PVFS-users
mailing list