[Pvfs2-developers] pvfs2-cp profile
Julian Martin Kunkel
Julian.Kunkel at web.de
Thu Jan 11 17:07:17 EST 2007
> Do you mean Pete's IB 4x numbers or my MX-10G numbers? Or both?
both are good, sorry I meant especially the throughput of the new
> I am not sure that I follow you here. Ideally, I only want to measure
> network activity and PVFS2 overhead. I would prefer to avoid
> measuring disk activity but these old nodes do not have enough memory
> to use ramfs well.
> My MX-10G results are from newer nodes that have enough memory to use
> ramfs effectively. I can't keep them tied up for bmi_mx development
> as long as I can these older nodes. :-)
TAS simply discards data and handles metadata efficiently in-memory. It also
is different by using immediately completion of I/O jobs, so the internal
handling of pvfs2 is a bit different. In the past this different view
sometimes helped the evaluation.
> > On the other hand you could use the pvfs2-hint-branch which
> > provides you with
> > better MPE logging on the server side, we have some tools to
> > convert and
> > merge client and pvfs2-server logs and show the results. I could
> > upgrade the
> > current pvfs2-hint-branch with your patches (which is currently
> > somewhere at
> > release 2.5). The reason we need parts of the advanced logging is
> > that logs
> > have a problem on the server side if multiple start events occur
> > before the
> > end events happen, for example if you use multiple flow streams.
> > @Scott
> > I have seen you solved it by using events and wonder which tool you
> > have used
> > to create states out of the events.
> > You said you have problems with the MPE log on the server, maybe we
> > could help
> > you if you give details ?
> I am using MPE. Since I am not using MPI, I compile mpich2 with
> CLFAGS="-DCLOG_NOMPI". I then add "-lmpe_nompi" to my LIBS and the
> path to mpich2 to my LDFLAGS.
> In my initialization function, I have:
> #if BMX_LOGGING
> send_start = MPE_Log_get_event_number();
> send_finish = MPE_Log_get_event_number();
> recv_start = MPE_Log_get_event_number();
> recv_finish = MPE_Log_get_event_number();
> sendunex_start = MPE_Log_get_event_number();
> sendunex_finish = MPE_Log_get_event_number();
> recvunex_start = MPE_Log_get_event_number();
> recvunex_finish = MPE_Log_get_event_number();
> MPE_Describe_state(send_start, send_finish, "Send", "red");
> MPE_Describe_state(recv_start, recv_finish, "Recv", "blue");
> MPE_Describe_state(sendunex_start, sendunex_finish,
> "SendUnex", "orange");
> MPE_Describe_state(recvunex_start, recvunex_finish,
> "RecvUnex", "green");
Ah I see so you use the states and not events, that actually was in the
earlier versions of MPE a problem (and I think it still is).
Assume that you have one client and for example the Category BMI_Send, you
have start and stop events, now assume that you actually see the sequence on
one client or server (introduced by multiple parallel flows):
start, start, stop, stop
MPE state should create two overlapping states, however it doesn't, it will
create only ONE state for actually both events, thus the resulting logs are
wrong! This only happens if one machine (e.g. one timeline) you get two
overlapping states !
Thus, we try to use the functions MPE_Log_get_solo_eventID and
PE_Describe_info_event to create single events and distribute them on
multiple time lines. We have a suite of slog2 transformation programs which
allow to merge client and server logs, split overlapping actions on multiple
> I can now get server logs. My SERVER_LDFLAGS were wrong. Also, on the
> server, I had to specify an absolute path (I did not on the client).
> I would be interested in merging the logs if you can provide some
> tools or insight.
The question which arises is if we could synchronize the pvfs2-hint-branch
with HEAD and will have all the stuff you need ?
If that is the case I could do so and you could use our branch (which also
provides a patched pvfs2-cp to support the hints :),
it especially allows for MPI to show JOBS and TROVE operations, however BMI
operations could be logged too, but not with "Request ID", but this is not
important for the pvfs2-cp utility and could be integraded into the log, too.
You can find a excerpt of mpi-io-test run with collective and contiguous I/O
regions and a jumpshot log here:
and http://www.rzuser.uni-heidelberg.de/~jkunkel2/4S4C-level-1.jpg. (Note that
on these diagrams the seperation of parallel states to different timelines is
Clients are timelines 0-3, 4 is the metadata server and 6-8 are
The first operation creates the file far right you can see some TROVE write
operations. If you move the mouse over the job you can see the pvfs2-job in
the third bar from the top, like CREATE also you can see the types of all
unexpected messages (Request decode) and request type they belong like
It could be possible though to merge your client log with the server log of
our environment. (Of course of the same run). I think this at least will
allow you to see idle times efficiently and might help to find the source.
Our working group will provide a package with the tracing tools and some
instructions how to use them, for you if you like, but we will need a few
However, then I have to upgrade the pvfs2-hint-branch and patch it with all
the stuff required to run MX, if you have patches for MX against HEAD, I
could upgrade the hint-branch to HEAD. Just tell me what you need for MX.
More information about the Pvfs2-developers