[lwhatley.ctr@navo.hpc.mil: Re: [Pvfs2-developers] Re:
[Pvfs2-users] PVFS2 on Infiniband]
Lee Whatley, Contractor
lwhatley.ctr at navo.hpc.mil
Tue Jun 6 17:01:15 EDT 2006
Pete Wyckoff wrote:
> I think we can declare success and chalk it up to weird scheduler
> behavior. I'll run some tests myself, then probably stick that
> sched_yield in the source, as it should be harmless for more recent
> kernels in that location.
>
> The fact that mkdir is eating 10% CPU on the server is due to this
> continuous polling thing that's in there. Even when yielding the
> processor to the dbpf thread once in a while, the bmi thread will
> still busywait for 1 sec hoping to get some more action from the
> network. When we switch to event-driven rather than polling, this
> should go away (or at least be configurable to go away).
Pete,
Well unfortunately I don't think we can quite yet declare victory. The
sched_yield() fix allows me to mount the filesystems regularly over
infiniband and do mkdir's and rm's till my hearts content.
Unfortunately doing any kind of real work is still causing problems.
Example: I tried untaring a 13MB file into /u/data1 (doing tar -xvzf).
This takes less than 10 seconds on a local filesystem (even on an nfs
filesystem). It took nearly 5 minutes on the pvfs2 filesystem! The tar
started out fine and pvfs2-client and pvfs2-server were both taking up
10% of the CPU each. Then these numbers started to climb, and the
amount of time between each file getting written started to slow at what
seemed to be an exponential rate. By the time the file had finished
untarring the system load was around 2.5 and 100% of the CPU was being
used by pvfs2-server and pvfs2-client. I/O wait was 0 so it wasn't
waiting on the disk.
So somewhere something still isn't quite right. Just let me know what I
can do to provide some useful data for you to look at and I'll be more
than happy to provide it.
Thanks!
-Lee
More information about the Pvfs2-developers
mailing list