[Pvfs2-users] pvfs2 stability
Andrea Carotti
and.carotti at farmchim.uniba.it
Wed May 24 05:57:58 EDT 2006
Hi again,
the problem seems resolved...
The fortran errors came from wrong settings of a program...
By the way I'm agree that some cron jobs are crucials.
Two weeks ago I was having problem with parallel computations using the
mpich2..all the processes died at 4.02 am..
I've write to the developers and they said me that the prelink (in the
cron.daily dir of course!?!?!) was causing the problem...
I think that this kind of scripts are critical in some environments so...For
the moment i've disabled all the cron.daily activities.
Bye
Andrea
----- Original Message -----
From: "Rob Ross" <rross at mcs.anl.gov>
To: "Mark Bartelt" <mark at cacr.caltech.edu>
Cc: "Andrea Carotti" <and.carotti at farmchim.uniba.it>;
<pvfs2-users at beowulf-underground.org>
Sent: Wednesday, May 24, 2006 5:25 AM
Subject: Re: [Pvfs2-users] pvfs2 stability
> Hi Mark,
>
> We are worried that the failures happened at all. We will be repeating
> this here to see if we can replicate, and if so, if we can fix it...
>
> Rob
>
> Mark Bartelt wrote:
>>>> Fedora comes with some cron jobs activated,
>>>> in particular the cron.daily:
>>
>> Indeed; not just Fedora, but most (all?) Linux
>> distros seem to come littered with all sorts of
>> cron jobs of questionable value. The old UNIX
>> "minimalist" approach (letting people add stuff
>> if they wanted) seems to have been replaced with
>> a "let's try to do nearly everything" one, which
>> forces people to remove things they don't really
>> _want_ to have enabled. But I'll stop ranting ...
>>
>> My point is that we were burned by this, thanks
>> to SuSE's /etc/cron.daily/updatedb (performs the
>> same function as /etc/cron.daily/slocate.cron on
>> Fedora): We'd seen horribly sluggish performance
>> on our PVFS filesystems once every day, until we
>> realized it was "updatedb" crunching through tens
>> of terabytes of files. And it was even worse, as
>> "updatedb" fired off at the same time on close to
>> ninety systems at once!
>>
>> The fix was obvious, namely adding "/pvfs" to the
>> "UPDATEDB_PRUNEPATHS" in /etc/sysconfig/locate (on
>> Fedora, it's "PRUNEPATHS" in /etc/updatedb.conf).
>>
>> We didn't see any failures (network, PVFS, or any
>> other issues) per se; just awful performance until
>> we told the cron job not to descend into the /pvfs
>> hierarchy.
>>
>> So if I were you, I'd still be a bit worried about
>> the fact the failures happened at all. It might be
>> worth doing some controlled heavy pounding of your
>> PVFS hierarchy (e.g. just a massive "find" to walk
>> the entire PVFS filesystem; or better, a bunch of
>> them going on simultaneously, launched from a lot
>> of different systems) to see whether the problems
>> recur ...
>> _______________________________________________
>> Pvfs2-users mailing list
>> Pvfs2-users at beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
More information about the Pvfs2-users
mailing list