[Pvfs2-developers] Review: experimental readcaching code

Walter B. Ligon III walt at clemson.edu
Thu Nov 2 10:12:43 EST 2006


Is there a particular reason this isn't documented IN THE CODE?
There's this thing called comments, that's what it's for!

After slogging through some of the more recent code (like the 
client-core) I'm becomming of the opinion that we should not accept 
patches or new code that isn't properly commented.  The original code 
base was much better (though not perfectly) commented.  Letting 
thousands of lines of code creep in uncommented like that is a bad trend.

Just a thought ...

Walt

Murali Vilayannur wrote:
> Hi Pete,
> 
>> I must have misunderstood then.  I thought that "pages" was a list
>> of all the ones needed for this particular access, while
>> "issue_pages" was the (possibly empty) subset that needed to have IO
>> started on them.  Then if IO failed, you would just want to kill off
>> the issue_pages, not the entire set.  But sounds like you know
>> what's going on.
> 
> 
> pages -> array of all pages needed for the access
> issue_pages -> array of only those pages for which I/O needs to be done
> page_list -> linked list of pages for which I/O is needed. this is
> required by the read_cache_pages() function and is drained after the
> cb finishes for each page.
> Hopefully, I have done the right thing there. I will double check today.
> 
>>
>> >> Little stuff:  no need to cast return value of kmalloc().  There's
>> >> also kzalloc() if you're going to do a memset(,0,) next.
>> >
>> > Is kzalloc() available on older kernels or should I add a configure 
>> time
>> > option for that?
>> > Will change this too..
>>
>> True, it's somewhat recent.  Ho hum.  Could write your own kzalloc()
>> and use the config option, or just not bother and use the older
>> interfaces.  I don't care.  Probably either is easier than
>> submitting the code to the kernel so they maintain future changes
>> like this.  :)
> 
> 
> yeah.. I think I will add the configure option and kzalloc stuff.
> 
>> Or some magic per-directory opt-in xattr flag.  I like that
>> thinking, as it parallels the immutable flag usage nicely.
> 
> 
> Exactly.
> 
>>
>> > In order to do write buffering efficiently, we should try to avoid 
>> doing a
>> > read before we modify the page and also keep track of the dirty
>> > ranges, but that seems like a hard thing to do from a module without 
>> vfs
>> > changes. Aggregating writebacks for efficiency is also
>> > an issue, but I suspect we could look at the ->writepages callback 
>> and see
>> > if that will work for us.
>>
>> "Avoid doing a read before we modify the page?"  Wouldn't it go
>> something like this?  I likely misunderstand again:
>>
>>     read:
>>         if IS_CTOO || IS_IMMUTABLE
>>             if in page cache
>>                 return data, dirty or not
>>             else
>>                 do IO
>>     write:
>>         if IS_CTOO
>>             if in page cache
>>                 update data, mark dirty
>>             else
>>                 do IO, then update and mark dirty
>>
>> and just let pagecache writeback take care of its business as normal.
>>
>> Dirty ranges?  Nah, don't bother.  Just do full page granularity.
> 
> 
> Oh ok.. I guess if we do whole page granularity we might be ok.. not
> sure. think of the following scenario,
> I am mostly worried about correctness issues with caching even in the
> presence of non-overlapping data.
> Initial file size is 0.
> Client 1 on node 1 writes to file offsets 0 to 3000 and client 2 on
> node 2 from 3001 to 6000. With a page size of 4096, they both share
> file block 0.. What ends up on disk for file block 0 with write
> buffering? Sadly, we cannot punt and say user error since the program
> was operating with nonoverlapping offsets.
> Perhaps, we could ask such users/programs from not using the write
> buffering algorithm..
> Will MPI-IO programs be able to prevent this kind of thing from
> happening magically?
> 
>>
>> And efficiency may just happen with the new writepages.  If it
>> doesn't, I don't think we should go out of our way to make this
>> usage model particularly fast.
> 
> 
> sounds good to me.
> 
>>
>> > BTW: Do not try the patch just yet. I have done zero testing on it 
>> and it
>> > might crash your machine.
>>
>> Heh.  I'm too busy crashing it myself with SCSI patches.
> 
> 
> Awesome. I am doing the same at work here :)
> 
>> P.S.  Your gmail produces ugly mail.  Can you turn off HTML, and
>> maybe get it to do quoting right?
> 
> 
> I turned off html. But i have no clue how to do the latter.
> Thanks as always for the insightful reviews..
> Murali
> _______________________________________________
> Pvfs2-developers mailing list
> Pvfs2-developers at beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

-- 
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University


More information about the Pvfs2-developers mailing list