[PVFS2-developers] does anyone know what this is? (fwd)

Phil Carns pcarns at wastedcycles.org
Wed Jun 22 19:55:00 EDT 2005


Wow, that's amazing that this was never caught before in sockio.  The 
PVFS1 code path is the same here.  I wonder if we lucked out in the 
past, maybe with glibcs that happened to set errno as well even though 
it isn't required.  At any rate, I would think the error code could just 
be converted on the spot with a macro (bmi_h_errno_to_pvfs(), to go 
along with bmi_errno_to_pvfs()).  The man page only lists a handful of 
possible h_errno values to look for.

-Phil


Walter B. Ligon III wrote:
> -------- 
> 
> OK, I believe the problem is that gethostbyname returns its error in
> h_errno rather than errno.  At least according to my man pages.  I
> will have to see if h_errno codes are non-overlapping with errno codes
> but I am assuming they do (otherwise why do it that way).  So the
> question is how to pass that error back out.  I can certainly use
> gossip to log an error on the spot.  Otherwise I can probably map
> h_errno codes to some unique codes and decode them later in gossip.
> 
> Any other advice and/or suggestions on handling this?  Otherwise I'll
> just deal with it.
> 
> Walt
> 
> 
>>If you've got time to knock it out, that would be great.  RobL and Sam 
>>are working on the test system and I'm dealing with a 7-hour time 
>>change, 90 degree days with no AC in the hotel, and a bunch of random 
>>deadlines...
>>
>>Rob
>>
>>Walter B. Ligon III wrote:
>>
>>>--------
>>>Crap!  I cut/paste the pvfs2tab line from the quickstart and *thought*
>>>I had edited it.  But I didn't - still said "testhost"
>>>
>>>But we REALLY should catch this error and produce a meaningful error.
>>>Like "No route to host testhost" or "Host testhost refuses connection"
>>>or something.
>>>
>>>Can someone familiar with that code look at that, or do I need to?
>>>
>>>Walt
>>>
>>>
>>>
>>>>Whoops, just realized that the earlier message in this thread was on the 
>>>>wrong list, moving over now.
>>>>
>>>>I'm not really sure looking at the code how you could get a zero errno 
>>>>value out of a failure in that path.  You may need to just gdb break on 
>>>>BMI_sockio_connect_sock() when you run pvfs2-ping and see if you can 
>>>>tell whats failing or why the errno value isn't being set.
>>>>
>>>>I guess its possible that BMI_sockio_connect_sock() isn't even being 
>>>>called at all (see blocks of code just before your bmi-tcp.c:1676 
>>>>error), but that shouldn't be the case in this client side code path 
>>>>unless something has jumbled memory and cleared the hostname out of the 
>>>>bmi address structure, or if the hostname was broken somehow to begin with.
>>>>
>>>>Anything strange in your pvfs2tab or fstab files?  Maybe the hostname is 
>>>> empty or something?
>>>>
>>>>Actually I just tried that- if I list the server as 
>>>>tcp://:3334/pvfs2-fs instead of tcp://localhost:3334/pvfs2-fs, then I 
>>>>see the same message you do.  The bmi address parser should probably 
>>>>check for that condition and stop things before it gets that far.
>>>>
>>>>-Phil
>>>>
>>>>
>>>>
>>>>>"bt" stands for back trace.  If you have back tracing enabled, then any 
>>>>>gossip_lerr() call not only prints the line number the message occurred 
>>>>>on, but also the stack trace.  The numbers off to the right are 
>>>>>addresses that you can convert to code locations with the "addr2line" 
>>>>>utility.
>>>>>
>>>>>The patch that I mentioned in response to Brad's email earlier happens 
>>>>>to also convert this gossip_lerr() call to a gossip_err() call; I don't 
>>>>>think that network/socket failures should result in a backtrace and line 
>>>>>number print- its pretty confusing as you have discovered :)
>>>>>
>>>>>As far as why it is failing in the first place, I don't have any clue at 
>>>>>the moment...
>>>>>
>>>>>-Phil
>>>>>
>>>>>Walter B. Ligon III wrote:
>>>>>
>>>>>
>>>>>
>>>>>>I've built the latest CVS build - none of my changes, installed it and
>>>>>>run pvfs2-ping, and I get this:
>>>>>>
>>>>>>[walt at sidious pvfs]> bin/pvfs2-ping -m /mnt/pvfs2
>>>>>>
>>>>>>(1) Parsing tab file...
>>>>>>
>>>>>>(2) Initializing system interface...
>>>>>>
>>>>>>(3) Initializing each file system found in tab file: /etc/pvfs2tab...
>>>>>>
>>>>>>[16:02:45.639438] src/io/bmi/bmi_tcp/bmi-tcp.c line 1676: Error: 
>>>>>>BMI_sockio_connect_sock: Success
>>>>>>[16:02:45.639742]       [bt] bin/pvfs2-ping [0x8086d91]
>>>>>>[16:02:45.639763]       [bt] bin/pvfs2-ping [0x8088923]
>>>>>>[16:02:45.639772]       [bt] 
>>>>>>bin/pvfs2-ping(BMI_tcp_post_sendunexpected_list+0x
>>>>>>a6) [0x808664e]
>>>>>>[16:02:45.639781]       [bt] 
>>>>>>bin/pvfs2-ping(BMI_post_sendunexpected_list+0x166)
>>>>>>[0x8073a2a]
>>>>>>[16:02:45.639790]       [bt] bin/pvfs2-ping(job_bmi_send_list+0x21b) 
>>>>>>[0x8078f07][16:02:45.639800]       [bt] bin/pvfs2-ping [0x807041f]
>>>>>>[16:02:45.639864]       [bt] bin/pvfs2-ping(vfprintf+0x3c9f) [0x8053973]
>>>>>>[16:02:45.639875]       [bt] 
>>>>>>bin/pvfs2-ping(PINT_client_state_machine_post+0x1c
>>>>>>d) [0x8052bb9]
>>>>>>[16:02:45.639886]       [bt] 
>>>>>>bin/pvfs2-ping(PINT_server_get_config+0x12f) [0x8063ad7]
>>>>>>[16:02:45.639896]       [bt] bin/pvfs2-ping(PVFS_sys_fs_add+0xc6) 
>>>>>>[0x8053e02]
>>>>>>[16:02:45.639904]       [bt] bin/pvfs2-ping(main+0xdd) [0x805025d]
>>>>>>Broken pipe
>>>>>>[walt at sidious pvfs]>
>>>>>>
>>>>>>Very similar error messages, only this time they have some function names
>>>>>>imbedded.  I've never seen this kind of message out of PVFS before, does
>>>>>>no one recognize these "bt" messages?
>>>>>>
>>>>>>Walt
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>I have a branch of the code I was working on at ANL, installed it 
>>>>>>>down here
>>>>>>>and it is doing very different things.  In particular the client spit 
>>>>>>>out
>>>>>>>this error which I'm having a hard time understanding:
>>>>>>>
>>>>>>>[walt at sidious pvfs]> bin/create.set.get.eattr /foo key1 value1
>>>>>>>[15:04:42.072103] src/io/bmi/bmi_tcp/bmi-tcp.c line 1676: Error: 
>>>>>>>BMI_sockio_connect_sock: Success
>>>>>>>[15:04:42.072264]       [bt] bin/create.set.get.eattr [0x8081a99]
>>>>>>>[15:04:42.072278]       [bt] bin/create.set.get.eattr [0x808362b]
>>>>>>>[15:04:42.072291]       [bt] bin/create.set.get.eattr [0x8081356]
>>>>>>>[15:04:42.072304]       [bt] bin/create.set.get.eattr [0x806d53a]
>>>>>>>[15:04:42.072316]       [bt] bin/create.set.get.eattr [0x80729b3]
>>>>>>>[15:04:42.072329]       [bt] bin/create.set.get.eattr [0x806a733]
>>>>>>>[15:04:42.072341]       [bt] 
>>>>>>>bin/create.set.get.eattr(vfprintf+0x366f) [0x804cc5b]
>>>>>>>[15:04:42.072377]       [bt] 
>>>>>>>bin/create.set.get.eattr(vfprintf+0x289d) [0x804be89]
>>>>>>>[15:04:42.072389]       [bt] bin/create.set.get.eattr [0x805ddeb]
>>>>>>>[15:04:42.072402]       [bt] bin/create.set.get.eattr [0x807c7aa]
>>>>>>>[15:04:42.072414]       [bt] bin/create.set.get.eattr [0x8068166]
>>>>>>>Broken pipe
>>>>>>>[walt at sidious pvfs]>
>>>>>>>As you see it threw and error in BMI.  Ran that down and found where
>>>>>>>the BMI function that connects the socket returned <0 but the strerror
>>>>>>>translation is "Success" which doesn't make much sense to me.
>>>>>>>
>>>>>>>Then all of these lines starting [bt] followed by the command line
>>>>>>>string of the client program, and and unknown hex value.  I have no idea
>>>>>>>where that is comming from.
>>>>>>>
>>>>>>>Have I misconfigured something?  I thought I configured just like I did
>>>>>>>the last time on this RedHat EL box.  Anyone recognize this?  It may 
>>>>>>>have
>>>>>>>something to do with my code, but my code should not have come into play
>>>>>>>yet, unless I did something terribly wrong.
>>>>>>>
>>>>>>>Walt
>>>>>>>
>>>>>
>>>>>_______________________________________________
>>>>>PVFS-developers mailing list
>>>>>PVFS-developers at www.beowulf-underground.org
>>>>>http://www.beowulf-underground.org/mailman/listinfo/pvfs-developers
>>>>
>>>>_______________________________________________
>>>>PVFS2-developers mailing list
>>>>PVFS2-developers at beowulf-underground.org
>>>>http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>>
>>>
>>>
> 



More information about the PVFS2-developers mailing list