[Pvfs2-developers] Review: experimental readcaching code

Rob Ross rross at mcs.anl.gov
Mon Nov 6 16:11:22 EST 2006


Hi all,

We always try to get good documentation on our code, but I don't think 
that it is fair to say that the older code is much better than the newer 
code. Except that maybe we've had more time go to back and add comments.

I'm not sure what you mean by well written code being a "cultural 
thing". That phrase has *way* too many interpretations to have been a 
good choice of terms here. I do think that we have the opportunity to 
create good habits within the "PVFS culture," if there is such a thing :).

The case that spurred this discussion was an *experimental* patch, so I 
think your approach to suggesting better comments was particularly 
inappropriate. I agree that getting the documentation into the code 
prior to acceptance would be a good thing to do, and certainly bits 
related to the kernel can be very confusing.

Thanks for your work Murali!

Regards,

Rob

Walter B. Ligon III wrote:
> I'm sorry, you must be smoking dope!  Those two bits of code have 
> significantly more comments than some other bits.  They aren't even in 
> the same league.  Again, as I said, none of it is perfect, certainly not 
> mine, but taking the attitude that there is poorly presented code,
> therefore we might as well not worry about it at all is not encouraging.
> 
> Well written code is a cultural thing.  I comment my code when I write 
> it, because I know I will never go back and do it later.  I only wish I 
> knew how to produce better documentation.
> 
> Just my opinion, I don't expect you to share it.
> 
> Walt
> 
> Sam Lang wrote:
>>
>> On Nov 2, 2006, at 9:12 AM, Walter B. Ligon III wrote:
>>
>>> Is there a particular reason this isn't documented IN THE CODE?
>>> There's this thing called comments, that's what it's for!
>>>
>> Hi Walt,
>>
>> I completely agree that documentation should be a high priority, but  
>> aren't we all guilty of not doing it as much as we should?  I've  
>> slogged through some pretty hairy bits of request processing and  
>> state machine code that I wished had a few more comments.
>>
>> In terms of patches, its nice to have comments in there, but in the  
>> end vetting should be done on the code, not the comments.  I would  
>> argue that this should be handled on a case by case basis.  If you  
>> get a patch you don't understand, ask for clarification as Pete has  
>> done in this case.
>>
>> -sam
>>
>>
>>> After slogging through some of the more recent code (like the  
>>> client-core) I'm becomming of the opinion that we should not accept  
>>> patches or new code that isn't properly commented.  The original  
>>> code base was much better (though not perfectly) commented.   Letting 
>>> thousands of lines of code creep in uncommented like that  is a bad 
>>> trend.
>>>
>>> Just a thought ...
>>>
>>> Walt
>>>
>>> Murali Vilayannur wrote:
>>>
>>>> Hi Pete,
>>>>
>>>>> I must have misunderstood then.  I thought that "pages" was a list
>>>>> of all the ones needed for this particular access, while
>>>>> "issue_pages" was the (possibly empty) subset that needed to have IO
>>>>> started on them.  Then if IO failed, you would just want to kill off
>>>>> the issue_pages, not the entire set.  But sounds like you know
>>>>> what's going on.
>>>>
>>>> pages -> array of all pages needed for the access
>>>> issue_pages -> array of only those pages for which I/O needs to be  
>>>> done
>>>> page_list -> linked list of pages for which I/O is needed. this is
>>>> required by the read_cache_pages() function and is drained after the
>>>> cb finishes for each page.
>>>> Hopefully, I have done the right thing there. I will double check  
>>>> today.
>>>>
>>>>>
>>>>> >> Little stuff:  no need to cast return value of kmalloc().   There's
>>>>> >> also kzalloc() if you're going to do a memset(,0,) next.
>>>>> >
>>>>> > Is kzalloc() available on older kernels or should I add a  
>>>>> configure time
>>>>> > option for that?
>>>>> > Will change this too..
>>>>>
>>>>> True, it's somewhat recent.  Ho hum.  Could write your own kzalloc()
>>>>> and use the config option, or just not bother and use the older
>>>>> interfaces.  I don't care.  Probably either is easier than
>>>>> submitting the code to the kernel so they maintain future changes
>>>>> like this.  :)
>>>>
>>>> yeah.. I think I will add the configure option and kzalloc stuff.
>>>>
>>>>> Or some magic per-directory opt-in xattr flag.  I like that
>>>>> thinking, as it parallels the immutable flag usage nicely.
>>>>
>>>> Exactly.
>>>>
>>>>>
>>>>> > In order to do write buffering efficiently, we should try to  
>>>>> avoid doing a
>>>>> > read before we modify the page and also keep track of the dirty
>>>>> > ranges, but that seems like a hard thing to do from a module  
>>>>> without vfs
>>>>> > changes. Aggregating writebacks for efficiency is also
>>>>> > an issue, but I suspect we could look at the ->writepages  
>>>>> callback and see
>>>>> > if that will work for us.
>>>>>
>>>>> "Avoid doing a read before we modify the page?"  Wouldn't it go
>>>>> something like this?  I likely misunderstand again:
>>>>>
>>>>>     read:
>>>>>         if IS_CTOO || IS_IMMUTABLE
>>>>>             if in page cache
>>>>>                 return data, dirty or not
>>>>>             else
>>>>>                 do IO
>>>>>     write:
>>>>>         if IS_CTOO
>>>>>             if in page cache
>>>>>                 update data, mark dirty
>>>>>             else
>>>>>                 do IO, then update and mark dirty
>>>>>
>>>>> and just let pagecache writeback take care of its business as  normal.
>>>>>
>>>>> Dirty ranges?  Nah, don't bother.  Just do full page granularity.
>>>>
>>>> Oh ok.. I guess if we do whole page granularity we might be ok.. not
>>>> sure. think of the following scenario,
>>>> I am mostly worried about correctness issues with caching even in the
>>>> presence of non-overlapping data.
>>>> Initial file size is 0.
>>>> Client 1 on node 1 writes to file offsets 0 to 3000 and client 2 on
>>>> node 2 from 3001 to 6000. With a page size of 4096, they both share
>>>> file block 0.. What ends up on disk for file block 0 with write
>>>> buffering? Sadly, we cannot punt and say user error since the program
>>>> was operating with nonoverlapping offsets.
>>>> Perhaps, we could ask such users/programs from not using the write
>>>> buffering algorithm..
>>>> Will MPI-IO programs be able to prevent this kind of thing from
>>>> happening magically?
>>>>
>>>>>
>>>>> And efficiency may just happen with the new writepages.  If it
>>>>> doesn't, I don't think we should go out of our way to make this
>>>>> usage model particularly fast.
>>>>
>>>> sounds good to me.
>>>>
>>>>>
>>>>> > BTW: Do not try the patch just yet. I have done zero testing on  
>>>>> it and it
>>>>> > might crash your machine.
>>>>>
>>>>> Heh.  I'm too busy crashing it myself with SCSI patches.
>>>>
>>>> Awesome. I am doing the same at work here :)
>>>>
>>>>> P.S.  Your gmail produces ugly mail.  Can you turn off HTML, and
>>>>> maybe get it to do quoting right?
>>>>
>>>> I turned off html. But i have no clue how to do the latter.
>>>> Thanks as always for the insightful reviews..
>>>> Murali
>>>> _______________________________________________
>>>> Pvfs2-developers mailing list
>>>> Pvfs2-developers at beowulf-underground.org
>>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>
>>>
>>> -- 
>>> Dr. Walter B. Ligon III
>>> Associate Professor
>>> ECE Department
>>> Clemson University
>>> _______________________________________________
>>> Pvfs2-developers mailing list
>>> Pvfs2-developers at beowulf-underground.org
>>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>>>
> 


More information about the Pvfs2-developers mailing list