[PVFS2-developers] does anyone know what this is? (fwd)

Rob Ross rross at mcs.anl.gov
Wed Jun 22 11:22:07 EDT 2005


If you've got time to knock it out, that would be great.  RobL and Sam 
are working on the test system and I'm dealing with a 7-hour time 
change, 90 degree days with no AC in the hotel, and a bunch of random 
deadlines...

Rob

Walter B. Ligon III wrote:
> --------
> Crap!  I cut/paste the pvfs2tab line from the quickstart and *thought*
> I had edited it.  But I didn't - still said "testhost"
> 
> But we REALLY should catch this error and produce a meaningful error.
> Like "No route to host testhost" or "Host testhost refuses connection"
> or something.
> 
> Can someone familiar with that code look at that, or do I need to?
> 
> Walt
> 
> 
>>Whoops, just realized that the earlier message in this thread was on the 
>>wrong list, moving over now.
>>
>>I'm not really sure looking at the code how you could get a zero errno 
>>value out of a failure in that path.  You may need to just gdb break on 
>>BMI_sockio_connect_sock() when you run pvfs2-ping and see if you can 
>>tell whats failing or why the errno value isn't being set.
>>
>>I guess its possible that BMI_sockio_connect_sock() isn't even being 
>>called at all (see blocks of code just before your bmi-tcp.c:1676 
>>error), but that shouldn't be the case in this client side code path 
>>unless something has jumbled memory and cleared the hostname out of the 
>>bmi address structure, or if the hostname was broken somehow to begin with.
>>
>>Anything strange in your pvfs2tab or fstab files?  Maybe the hostname is 
>>  empty or something?
>>
>>Actually I just tried that- if I list the server as 
>>tcp://:3334/pvfs2-fs instead of tcp://localhost:3334/pvfs2-fs, then I 
>>see the same message you do.  The bmi address parser should probably 
>>check for that condition and stop things before it gets that far.
>>
>>-Phil
>>
>>
>>>"bt" stands for back trace.  If you have back tracing enabled, then any 
>>>gossip_lerr() call not only prints the line number the message occurred 
>>>on, but also the stack trace.  The numbers off to the right are 
>>>addresses that you can convert to code locations with the "addr2line" 
>>>utility.
>>>
>>>The patch that I mentioned in response to Brad's email earlier happens 
>>>to also convert this gossip_lerr() call to a gossip_err() call; I don't 
>>>think that network/socket failures should result in a backtrace and line 
>>>number print- its pretty confusing as you have discovered :)
>>>
>>>As far as why it is failing in the first place, I don't have any clue at 
>>>the moment...
>>>
>>>-Phil
>>>
>>>Walter B. Ligon III wrote:
>>>
>>>
>>>>I've built the latest CVS build - none of my changes, installed it and
>>>>run pvfs2-ping, and I get this:
>>>>
>>>>[walt at sidious pvfs]> bin/pvfs2-ping -m /mnt/pvfs2
>>>> 
>>>>(1) Parsing tab file...
>>>> 
>>>>(2) Initializing system interface...
>>>> 
>>>>(3) Initializing each file system found in tab file: /etc/pvfs2tab...
>>>> 
>>>>[16:02:45.639438] src/io/bmi/bmi_tcp/bmi-tcp.c line 1676: Error: 
>>>>BMI_sockio_connect_sock: Success
>>>>[16:02:45.639742]       [bt] bin/pvfs2-ping [0x8086d91]
>>>>[16:02:45.639763]       [bt] bin/pvfs2-ping [0x8088923]
>>>>[16:02:45.639772]       [bt] 
>>>>bin/pvfs2-ping(BMI_tcp_post_sendunexpected_list+0x
>>>>a6) [0x808664e]
>>>>[16:02:45.639781]       [bt] 
>>>>bin/pvfs2-ping(BMI_post_sendunexpected_list+0x166)
>>>> [0x8073a2a]
>>>>[16:02:45.639790]       [bt] bin/pvfs2-ping(job_bmi_send_list+0x21b) 
>>>>[0x8078f07][16:02:45.639800]       [bt] bin/pvfs2-ping [0x807041f]
>>>>[16:02:45.639864]       [bt] bin/pvfs2-ping(vfprintf+0x3c9f) [0x8053973]
>>>>[16:02:45.639875]       [bt] 
>>>>bin/pvfs2-ping(PINT_client_state_machine_post+0x1c
>>>>d) [0x8052bb9]
>>>>[16:02:45.639886]       [bt] 
>>>>bin/pvfs2-ping(PINT_server_get_config+0x12f) [0x8063ad7]
>>>>[16:02:45.639896]       [bt] bin/pvfs2-ping(PVFS_sys_fs_add+0xc6) 
>>>>[0x8053e02]
>>>>[16:02:45.639904]       [bt] bin/pvfs2-ping(main+0xdd) [0x805025d]
>>>>Broken pipe
>>>>[walt at sidious pvfs]>
>>>>
>>>>Very similar error messages, only this time they have some function names
>>>>imbedded.  I've never seen this kind of message out of PVFS before, does
>>>>no one recognize these "bt" messages?
>>>>
>>>>Walt
>>>>
>>>>
>>>>
>>>>>I have a branch of the code I was working on at ANL, installed it 
>>>>>down here
>>>>>and it is doing very different things.  In particular the client spit 
>>>>>out
>>>>>this error which I'm having a hard time understanding:
>>>>>
>>>>>[walt at sidious pvfs]> bin/create.set.get.eattr /foo key1 value1
>>>>>[15:04:42.072103] src/io/bmi/bmi_tcp/bmi-tcp.c line 1676: Error: 
>>>>>BMI_sockio_connect_sock: Success
>>>>>[15:04:42.072264]       [bt] bin/create.set.get.eattr [0x8081a99]
>>>>>[15:04:42.072278]       [bt] bin/create.set.get.eattr [0x808362b]
>>>>>[15:04:42.072291]       [bt] bin/create.set.get.eattr [0x8081356]
>>>>>[15:04:42.072304]       [bt] bin/create.set.get.eattr [0x806d53a]
>>>>>[15:04:42.072316]       [bt] bin/create.set.get.eattr [0x80729b3]
>>>>>[15:04:42.072329]       [bt] bin/create.set.get.eattr [0x806a733]
>>>>>[15:04:42.072341]       [bt] 
>>>>>bin/create.set.get.eattr(vfprintf+0x366f) [0x804cc5b]
>>>>>[15:04:42.072377]       [bt] 
>>>>>bin/create.set.get.eattr(vfprintf+0x289d) [0x804be89]
>>>>>[15:04:42.072389]       [bt] bin/create.set.get.eattr [0x805ddeb]
>>>>>[15:04:42.072402]       [bt] bin/create.set.get.eattr [0x807c7aa]
>>>>>[15:04:42.072414]       [bt] bin/create.set.get.eattr [0x8068166]
>>>>>Broken pipe
>>>>>[walt at sidious pvfs]>
>>>>>As you see it threw and error in BMI.  Ran that down and found where
>>>>>the BMI function that connects the socket returned <0 but the strerror
>>>>>translation is "Success" which doesn't make much sense to me.
>>>>>
>>>>>Then all of these lines starting [bt] followed by the command line
>>>>>string of the client program, and and unknown hex value.  I have no idea
>>>>>where that is comming from.
>>>>>
>>>>>Have I misconfigured something?  I thought I configured just like I did
>>>>>the last time on this RedHat EL box.  Anyone recognize this?  It may 
>>>>>have
>>>>>something to do with my code, but my code should not have come into play
>>>>>yet, unless I did something terribly wrong.
>>>>>
>>>>>Walt
>>>>>
>>>
>>>_______________________________________________
>>>PVFS-developers mailing list
>>>PVFS-developers at www.beowulf-underground.org
>>>http://www.beowulf-underground.org/mailman/listinfo/pvfs-developers
>>
>>_______________________________________________
>>PVFS2-developers mailing list
>>PVFS2-developers at beowulf-underground.org
>>http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>
> 
> 


More information about the PVFS2-developers mailing list