[Pvfs2-developers] Missing op_ids in pvfs2-ping?
Scott Atchley
atchley at myri.com
Wed Dec 27 13:38:47 EST 2006
On Dec 21, 2006, at 4:57 PM, Pete Wyckoff wrote:
> atchley at myri.com wrote on Thu, 21 Dec 2006 16:26 -0500:
>> On Dec 21, 2006, at 3:59 PM, Pete Wyckoff wrote:
>>> atchley at myri.com wrote on Thu, 21 Dec 2006 15:50 -0500:
>>>> Client posts a receive with op_id 5, bmi tag 1 and length 32808
>>>> Client posts an unexpected send with op_id 7, bmi tag 1 and
>>>> length 24
> [..]
>>>> Server receives unexpected recv with bmi tag 1 and length 24
>>>> Server posts an expected send with op_id 79, bmi tag 1 and
>>>> length 816
> [..]
>>>> On the Client:
>>>> [E 15:40:10.538206] job_time_mgr_expire: job time out:
>>>> cancelling bmi
>>>> operation, job_id: 4.
>>>> [E 15:40:10.538421] job_time_mgr_expire: job time out:
>>>> cancelling bmi
>>>> operation, job_id: 6.
> [..]
>>>> On the Server:
>>>> [E 12/21 15:40] job_time_mgr_expire: job time out: cancelling bmi
>>>> operation, job_id: 78.
> [..]
>> I did not think the op_ids would match, but bmi_mx does not see the
>> timed out ops in any post_send or post_recv functions. Are these
>> operations passing through bmi_mx (possibly via other BMI_meth_*
>> functions) or are these unrelated to bmi_mx?
>
> IDs are assigned to jobs. IDs are also assigned to BMI operations.
> They share the same number space but are different things. A job
> may require a few BMI operations to go to completion, and perhaps a
> few disk operations. Job id 78 seems to require BMI id 79, for
> instance.
Ok, it helps if you set *outcount in BMI_Mmeth_testcontext() to let
BMI know then you completed something. ;-)
After fixing other miscellaneous bugs, I now get:
% pvfs2-ping -m /mnt/pvfs2
(1) Parsing tab file...
(2) Initializing system interface...
(3) Initializing each file system found in tab file: /etc/pvfs2tab...
PVFS2 servers: mx://fog33:0:3
Storage name: pvfs2-fs
Local mount point: /mnt/pvfs2
/mnt/pvfs2: Ok
(4) Searching for /mnt/pvfs2 in pvfstab...
PVFS2 servers: mx://fog33:0:3
Storage name: pvfs2-fs
Local mount point: /mnt/pvfs2
meta servers:
mx://fog33:0:3
data servers:
mx://fog33:0:3
(5) Verifying that all servers are responding...
meta servers:
mx://fog33:0:3 Ok
data servers:
mx://fog33:0:3 Ok
(6) Verifying that fsid 1318064247 is acceptable to all servers...
Ok; all servers understand fs_id 1318064247
(7) Verifying that root handle is owned by one server...
Root handle: 1048576
Ok; root handle is owned by exactly one server.
zsh: segmentation fault (core dumped) pvfs2-ping -m /mnt/pvfs2
The segfault is in my cleanup code and I am looking into it.
Scott
More information about the Pvfs2-developers
mailing list