[Pvfs2-users] metadata problems with pvfs-2.6.0
Vikrant Kumar
vsk at fluent.co.in
Fri Dec 29 10:10:06 EST 2006
Hello Murali,
On the client logs Im seeing some errors related to this as follows:
...
[E 08:05:59.628334] PINT_cached_config_get_server_name failed: Invalid
argument
[E 08:05:59.628411] Failed to map server address to handle
[E 08:05:59.628422] src/client/sysint/sys-getattr.sm line 708: Error:
failed to resolve meta server addresses.
[E 08:05:59.628544] [bt] pvfs2-client-core [0x416f28]
[E 08:05:59.628555] [bt] pvfs2-client-core [0x4159a8]
[E 08:05:59.628564] [bt]
pvfs2-client-core(PINT_client_state_machine_testsome+0x1a0) [0x415e90]
[E 08:05:59.628574] [bt] pvfs2-client-core [0x411877]
[E 08:05:59.628583] [bt] pvfs2-client-core(main+0x465) [0x4133c5]
[E 08:05:59.628592] [bt] /lib64/tls/libc.so.6(__libc_start_main+0xea)
[0x2a95cd4aaa]
[E 08:05:59.628601] [bt] pvfs2-client-core(__strtoll_internal+0x42)
[0x40cc5a]
...
Any thing clicks? The server logs are quiet though.
All machines have the following configuration.
db : 4.2
kernel : 2.6.5-7.244-smp
arch : x86_64
distro : SLES 9
and the pvfs is running on IB, I dont think it would be possible to
check for TCP. We have used both TCP and IB with pvfs2-1.5 on a smaller
cluster. This is a different cluster and a new PVFS so a lot of new
variables.
I think Im going to re-install the whole thing with pvfs-2.6.1, just to
make sure and try again. I had avoided this since its an expensive step
for us timewise, getting to involve the IT guys etc.
Thanks
Vikrant
> Hi Vikrant,
> Do all these machines bind to the same NIS domain/group/server?
> Is it possible that your uids/gids etc dont match up on all the
> different machines?
> We have had this problem that servers would rely on NIS to set things
> up correctly which Sam has fixed in HEAD. I am not sure if that is
> what you are seeing here.. could be wrong though..
> What distro/version of berk db/kernel version are you running? Do you
> see anything on the client-kernel logs or the server logs
> (pvfs2-server.log)? Are all the machines 32 bit or 64 bit or a
> mixture?
> There is something really wrong on your setup..something as simple as
> this should work.
> BTW: are you using IB or can this problem be repro'ed with tcp as well?
> thanks for the reports!
> Murali
>
>
>> Some more info about the issue mentioned below:
>> I can now reproduce this problem consistently by just creating a file on
>> specific machines, and it seems to depend on whether that particular
>> machine has just the client running or both server and client running.
>> In my PVFS configuration it is as follows:
>> running only client : deva02 and deva03
>> running both client and server : deva{04,11}
>>
>> So if I create a file on any machine in the second group(running both
>> client and server) it is not accessible from the first group(trying to
>> ls for that file gives "Invalid argument" error ).
>>
>> Guys, any clue whats happening?
>>
>> Thanks
>> Vikrant
>>
>>
>>> Hi,
>>>
>>> This is the layout of the file system:
>>> 11 clients on deva{02-11}
>>> 8 servers on deva{04-11}, each node has four cores.
>>>
>>> I created this file on deva02 and try to look for it on deva04, this
>>> is the error i get:
>>>
>>> vsk at deva04:/mnt/pvfs2/vsk/fl5l2$ ls test.jou
>>> ls: test.jou: Invalid argument
>>>
>>> On deva02 it lists the file correctly. I have waited for much longer
>>> than 30 seconds for this(many minutes and now days). This does not
>>> happen always and usually things work fine. Im not sure what
>>> particular way to get this situation. I had to dig into the console
>>> history to get this output.
>>>
>>> This is with a proprietary code, but I will try to send you some
>>> sample MPI code which shows a similar problem soon. We have used this
>>> code successfully with a previous installation of pvfs2-1.5, so looks
>>> like some installation issue or a bug in the current release.
>>> Would the config files and configure options for this installation
>>> help you to identify if its an installation issue?
>>>
>>> Thanks
>>> Vikrant
>>>
>>> Sam Lang wrote:
>>>
>>>> Hi Vikrant,
>>>>
>>>> Along with MPI code, if you could send us the output of your shell
>>>> commands and the errors you see that would also be helpful in debugging.
>>>>
>>>> Thanks,
>>>>
>>>> -sam
>>>>
>>>> On Dec 20, 2006, at 11:44 AM, Robert Latham wrote:
>>>>
>>>>
>>>>> On Wed, Dec 20, 2006 at 06:55:22PM +0530, Vikrant Kumar wrote:
>>>>>
>>>>>> With MPI applications it fails at certain times in MPI_File_open on
>>>>>> some
>>>>>> nodes, which again looks similar to the above problem.
>>>>>>
>>>>>> Can you guys suggest me how to isolate the problem?
>>>>>> Let me know what information you require.
>>>>>>
>>>>> Oh, one more thing that would help is if you can send us the MPI code
>>>>> you are using. If we can reproduce the problem on our end, that will
>>>>> make debugging and fixing a lot easier.
>>>>>
>>>>> ==rob
>>>>>
>>>>> --Rob Latham
>>>>> Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
>>>>> Argonne National Lab, IL USA B29D F333 664A 4280 315B
>>>>> _______________________________________________
>>>>> Pvfs2-users mailing list
>>>>> Pvfs2-users at beowulf-underground.org
>>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>>>>
>>>>>
>>>
>> _______________________________________________
>> Pvfs2-users mailing list
>> Pvfs2-users at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>>
More information about the Pvfs2-users
mailing list