[Pvfs2-developers] Missing op_ids in pvfs2-ping?

Scott Atchley atchley at myri.com
Wed Dec 27 13:38:47 EST 2006


On Dec 21, 2006, at 4:57 PM, Pete Wyckoff wrote:

> atchley at myri.com wrote on Thu, 21 Dec 2006 16:26 -0500:
>> On Dec 21, 2006, at 3:59 PM, Pete Wyckoff wrote:
>>> atchley at myri.com wrote on Thu, 21 Dec 2006 15:50 -0500:
>>>> Client posts a receive with op_id 5, bmi tag 1 and length 32808
>>>> Client posts an unexpected send with op_id 7, bmi tag 1 and  
>>>> length 24
> [..]
>>>> Server receives unexpected recv with bmi tag 1 and length 24
>>>> Server posts an expected send with op_id 79, bmi tag 1 and  
>>>> length 816
> [..]
>>>> On the Client:
>>>> [E 15:40:10.538206] job_time_mgr_expire: job time out:  
>>>> cancelling bmi
>>>> operation, job_id: 4.
>>>> [E 15:40:10.538421] job_time_mgr_expire: job time out:  
>>>> cancelling bmi
>>>> operation, job_id: 6.
> [..]
>>>> On the Server:
>>>> [E 12/21 15:40] job_time_mgr_expire: job time out: cancelling bmi
>>>> operation, job_id: 78.
> [..]
>> I did not think the op_ids would match, but bmi_mx does not see the
>> timed out ops in any post_send or post_recv functions. Are these
>> operations passing through bmi_mx (possibly via other BMI_meth_*
>> functions) or are these unrelated to bmi_mx?
>
> IDs are assigned to jobs.  IDs are also assigned to BMI operations.
> They share the same number space but are different things.  A job
> may require a few BMI operations to go to completion, and perhaps a
> few disk operations.  Job id 78 seems to require BMI id 79, for  
> instance.

Ok, it helps if you set *outcount in BMI_Mmeth_testcontext() to let  
BMI know then you completed something. ;-)

After fixing other miscellaneous bugs, I now get:

% pvfs2-ping -m /mnt/pvfs2

(1) Parsing tab file...

(2) Initializing system interface...

(3) Initializing each file system found in tab file: /etc/pvfs2tab...

    PVFS2 servers: mx://fog33:0:3
    Storage name: pvfs2-fs
    Local mount point: /mnt/pvfs2
    /mnt/pvfs2: Ok

(4) Searching for /mnt/pvfs2 in pvfstab...

    PVFS2 servers: mx://fog33:0:3
    Storage name: pvfs2-fs
    Local mount point: /mnt/pvfs2

    meta servers:
    mx://fog33:0:3

    data servers:
    mx://fog33:0:3

(5) Verifying that all servers are responding...

    meta servers:
    mx://fog33:0:3 Ok

    data servers:
    mx://fog33:0:3 Ok

(6) Verifying that fsid 1318064247 is acceptable to all servers...

    Ok; all servers understand fs_id 1318064247

(7) Verifying that root handle is owned by one server...

    Root handle: 1048576
      Ok; root handle is owned by exactly one server.

zsh: segmentation fault (core dumped)  pvfs2-ping -m /mnt/pvfs2

The segfault is in my cleanup code and I am looking into it.

Scott


More information about the Pvfs2-developers mailing list