[Pvfs2-users] PVFS 2.6.2 intermittent cmp/diff failure
Rob Ross
rross at mcs.anl.gov
Fri Jun 29 07:52:59 EDT 2007
Thanks for the very detailed followup email. -- Rob
Mark Van De Vyver wrote:
> Hi,
> After two new PSU's, and 2 new CPU's it seems I've resolved this issue...
>
> The inconsistent file copy results are not caused by PVFS2.
>
> The network centric nature of PVFS2 means that it might seem as if
> PVFS2 is at fault. However it simply triggers some problem with the
> sata_nv and timer code in some linux kernels, and on some hardware.
> In case it helps I've summarized my experience.
>
> Apparent symptom:
> ----------------------------
> - Files copied to the PVFS2 area might fail a diff or cmp check
> (see thread below).
> - Typically this occurs when:
> a) large files are copied and
> b) several clients are copying/reading to the PVFS2 area.
> - no errors were reported in /var/log/messages (but you might see
> reports about lost ticks or cpu frequency changes)
>
> Real symptom:
> ----------------------
> - The disks are being placed under load when the network connection
> is also under some load.
>
> Related reports:
> ----------------------
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=55223
> http://lists.linuxcoding.com/kernel/2006-q1/msg21399.html
>
> How I diagnosed:
> ------------------------
> - kernel boot parameters:
> report_lost_ticks apic=debug mce=bootlog showopts
>
> Conjectured Workaround
> -----------------------------------
> This allowed me to download, compile and install a new kernel. These
> boot parameters may or may not remedy the inconsistent file copy
> results....
> - Add kernel boot parameter (severe and gave me boot up problems)
> noapic
> - Or, less severe, and worked for me, add:
> no_timer_check
>
> Solution:
> ------------
> - Upgrade to kernel 2.6.21 (or more recent?, i.e. I'm using 2.6.21.5).
> No kernel parameters need be passed, e.g. can drop the no_timer_check.
>
> System:
> ------------
> - 3 quad opteron 852 supermicro servers (motherboard h8qce+)
> - each with 3 sata drive arranged as 3 stripe LVM, formatted with xfs
> (openSUSE10.2 defaults)
> - This may be specific to the nVidia ck804 chipset and/or the AMD
> 64bit processors (?)
>
> Comments:
> ----------------
> I saw these inconsistent copy results using Rocks Cluster 4.2 and pvfs
> 1.5.1 and pvfs 2.6.2. Then I switched to openSUSE 10.2 (kernel
> 2.6.18) and pvfs 2.6.3 before discovering that this was likely a
> kernel problem. So the above workaround and following description
> applies to the openUSUSE kernel 2.6.18. I did have the suse install
> fail a couple of times when doing the online update - suggesting the
> 'bad copy' problem was still evident on my hardware with the openSUSE
> 10.2 default x86_64 kernel.
>
> Booting the with the kernel parameters described above, I got
> boot.msg.1 (attached)
> This is the openSUSE 10.2 default updated kernel... many 'lost ticks'
> but the lost ticks that where around when the sata_nv is called
> (around line 504) are gone. I don't have a copy of the boot.msg
> showing lost ticks where sata_nv is called, so you'll have to take my
> word :)
>
> Anyway, using the 'no_timer_check' boot parameter I was able to
> download a 2.6.22.rc4 kernel from the suse build factory... this
> allowed me to build the 2.6.21.5 kernel from the sources. The
> 2.6.22.rc4 didn't report any lost ticks (see boot.msg.2.6.22.rc4) and
> I installed it a couple of times to try and make sure the update was
> 'clean/good'.
> I compiled the 2.6.21.5 kernel configured for the AMD opteron rather
> than the generic x86_64 because of the exceptions (see line 401):
>
> <4>ACPI Exception (processor_core-0781): AE_NOT_FOUND, Processor
> Device is not present [20070126]
>
> I haven't stuck with the 2.6.22.rc4 kernel because of the:
>
> <3>ck804xrom ck804xrom_init_one(): Unable to register resource
> 0x00000000ffc00000-0x00000000ffffffff - kernel bug?
>
> and because of the PVFS2 FAQ and the remark in an earlier pvfs2-users
> thread that the kernel can be very fluid in the RC phase.
>
> Upshot is, a (moderately customized) 2.6.21.5 kernel seems to
> eliminate this observed behavior... 300GB copied and no diff or cmp
> has failed.... fingers crossed for the next 700GB.
>
> The Redhat bug report thread (link is above) suggests this has been a
> problem that has dropped in and out of the kernel, so this might be
> worth adding to any 'list of suspects' you bear in mind. Thought it
> looks like the kernel gurus gave the timer code a work-over for the
> 2.6.21 release.
>
> Thanks again for everyone's kind help. I hope this saves someone
> replacing hardware and saves some time or prevents premature aging :)
>
> Regards
> Mark
>
> On 3/5/07, Mark Van De Vyver <mvyver at gmail.com> wrote:
>> Hi Sam,
>>
>> Apologies if I should have brought some of the following up earlier:
>> My machines are Supermicro quad Opteron 852 processor machines (all
>> three).
>> I took care to specify only the supermicro certified memory modules,
>> and I've run memtestx86+ over 5 hours on all three machine and all was
>> OK.
>> I've simultaneously run cpuburn-in on each of the 4 processors (ie
>> 100% load) - after 15 minutes all was OK on all machines, but I'll do
>> a longer test when I get the chance.
>>
>> Responses inline.
>>
>> > > ---snip---
>> >
>> > As Murali mentioned in a previous email, PVFS doesn't provide file
>> > locks, but if you're accessing your tables from just the frontend
>> > node, that shouldn't be a problem. That being said, while a MySQL
>> > server might run successfully on a PVFS mount, the performance of
>> > such a setup is likely going to be worse than a distributed MySQL
>> > setup. PVFS will stripe the tables over the IO servers in what
>> > undoubtedly ends up being a non-optimal fashion.
>>
>> Yup, I'm likely to need a large db - and wanted the storage area
>> 'headroom' more than high throughput.
>> But I may reconsider this.
>>
>> > >
>> > >> Could you send us the fs.conf file you're using on each of the
>> nodes?
>> > >
>> > > I've attached the config files - these were in /opt/pvfs2/etc my
>> > > understanding is they are what Rocks distributes to the nodes as part
>> > > of the PVFS2 Roll configuration step.
>> >
>> > First off, the PVFS2 roll configuration is a bit wonky. They appear
>> > to call metadata servers 'frontend' nodes, and IO servers 'compute'
>> > nodes. This means that you're only using the frontend node disk for
>> > metadata (which really doesn't come close to its ~ 1TB capacity), and
>> > your two compute nodes hold all the actual file data. It sounds like
>> > you really want to distribute all the data amongst the three nodes in
>> > your system (frontend, compute0, compute1), and use one of them as
>> > the metadata server as well. You can do this by modifying the
>> > fs.conf so that 'frontend' appears in the <DataHandleRanges> context
>> > with a unique range for its IO handles (something outside the ranges
>> > already chosen by compute0 and compute1).
>>
>> Thanks for those comments - I'll make that change. I assume I don't
>> need to backup-restore the pvfs2 area for this kind of change - i.e.
>> existing data will be unaffected?
>>
>> > As for the diff of files failing, it seems odd that its happening
>> > _after_ the diff has already succeeded.
>>
>> Correct, and only when with several pvfs servers are copying/reading
>> data and with large or binary files.
>>
>> >
>> > Is it possible that when the dvd is mounted again, the filenames or
>> > their content have somehow changed between the last time it was
>> > mounted? Or with the way you grab the dvd labels off the disk using
>> > dd, they somehow end up being the same, and the filenames, are the
>> > same on different dvds, but their contents differ?
>>
>> Unfortunately not. Each DVD's content is copied to a sub-directory
>> that is the DVD label.
>> The DVD labels are definitely not the same for the DVD-A, B, C case
>> described.
>> Between when a files passes the diff/cmp check and fails the only
>> difference seems to be that the pvfs infrastructure has been put under
>> increased load - but I seem to be unique with this :(
>>
>> > It would be helpful to know how the diff fails. Would it be possible
>> > to abort the script at the first sign of a diff returning non-zero,
>> > and look at the differences between the files? How much of the file
>> > on the PVFS volume is corrupted? Is it all at the end, etc.?
>>
>> In this respect cmp is more helpful since it reports the byte at which
>> the difference occurs - on one occasion I noticed that this it was not
>> always the same byte when consecutive calls to cmp were made for the
>> same file.
>> We have an electrical storm here and these machines are not on a ups,
>> so I've shut everything down - I'll try confirm this tomorrow or
>> shortly there after.
>>
>> Hope that helps
>> Mark
>>
>>
>> > >
>> > >> >
>> > >> > I have 3 terminals open showing:
>> > >> > tail -f /var/log/messages
>> > >> > tail -f /var/log/dmsg
>> > >> > tail -f /var/log/syslog
>> > >> > I see no output to these files throughout this exercise.
>> > >> >
>> > >> > The pvfs2-client.log files from each machine are attached.
>> > >> >
>> > >> > It seems this problem occurs when PVFS2 is under load?
>> > >>
>> > >> Possibly. When you get to the end of those steps, are you still
>> able
>> > >> to access files in the file system?
>> > >
>> > > Yes I can. I can open a text file copied from the DVD-A, but then it
>> > > seems text or small files never fail the cmp/diff verification.
>> > >
>> > > Thanks again for the time and effort put into these exchanges and the
>> > > development of PVFS.
>> > >
>> > > Regards
>> > > Mark
>> > >
>> > >> Thanks,
>> > >>
>> > >> -sam
>> > >>
>> > >> >
>> > >> > Hope this helps.
>> > >> > Mark
>> > >> >
>> > >> >> Thanks,
>> > >> >>
>> > >> >> Rob
>> > >> >>
>> > >> >> Mark Van De Vyver wrote:
>> > >> >> > Thanks Steve,
>> > >> >> > I don't see any problem until I run the diff or cmp and even
>> > >> then
>> > >> >> > these indicate the files are identical if the cmp is run
>> > >> >> _immediately_
>> > >> >> > after the file copy.
>> > >> >> > cmp and diff only indicate a difference when a file is
>> 'checked'
>> > >> >> after
>> > >> >> > some other files have been copied-checked.
>> > >> >> >
>> > >> >> > The files are from the NYSE trade and quote (TAQ) DVD's, so
>> they
>> > >> >> are
>> > >> >> > text stored as binary.
>> > >> >> >
>> > >> >> > You might be able to try the following with a dozen or so large
>> > >> >> binary
>> > >> >> > files, I have approx 300-400GB stored in the PVFS area.
>> > >> >> >
>> > >> >> > Ideally the following should be run on two or more PVFS2
>> > >> servers at
>> > >> >> > the same time, apply this to several DVD's that have not been
>> > >> >> copied
>> > >> >> > to the PVFS area, then reapply the script to the same DVD's
>> > >> >> after they
>> > >> >> > have been copied.
>> > >> >> > The following is a slightly simplified version of my script -
>> > >> >> here I
>> > >> >> > don't delete and re-copy when an existing file fails the cmp
>> > >> >> > verification:
>> > >> >> >
>> > >> >> > # untested script start
>> > >> >> > for fn in `ls /dvd/*large.bin|sed -e 's/\/dev\//g'`
>> > >> >> > do
>> > >> >> > if [ -f /mnt/pvfs2/${fn} ]
>> > >> >> > then
>> > >> >> > # This should 'fail' more frequently than the cmp in
>> > >> >> the else
>> > >> >> > clause
>> > >> >> > cmp ${fn} /mnt/pvfs2/${fn}
>> > >> >> > if [ $? != 0 ]
>> > >> >> > then
>> > >> >> > echo "Prexisting copy not exact - more frequent
>> and
>> > >> >> random?"
>> > >> >> > fi
>> > >> >> > else
>> > >> >> > cp ${fn} /mnt/pvfs2/${fn}
>> > >> >> > cmp ${fn} /mnt/pvfs2/${fn}
>> > >> >> > if [ $? != 0 ]
>> > >> >> > then
>> > >> >> > echo " Initial copy not exact - less frequent
>> > >> >> and random"
>> > >> >> > fi
>> > >> >> > done
>> > >> >> > # untested script end
>> > >> >> >
>> > >> >> > Regards
>> > >> >> > Mark
>> > >> >> >
>> > >> >> > On 3/2/07, Steve <steve at bov.nu> wrote:
>> > >> >> >> My setup is a little different in that at the moment I have
>> > >> 2 I/O
>> > >> >> >> services
>> > >> >> >> running on one box, a metadata on another and a client/samba
>> > >> >> server on a
>> > >> >> >> third. I have moved in the data via samba. We have copied in
>> > >> >> mp3's and
>> > >> >> >> avi/mpg's as well as large ISO's plus software exe's. Surely
>> > >> after
>> > >> >> >> several
>> > >> >> >> week of use we would notice some problem ?
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> I do have another box set up as a client that happens to
>> have a
>> > >> >> dvd ROM
>> > >> >> >> drive in it.
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> What type of files ? A vob ?
>> > >> >> >>
>> > >> >> >> What sequence of commands would I need to do you test your
>> > >> >> problem ?
>> > >> >> >>
>> > >> >> >> If I get a little spare time I could try for U ?
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> Steve
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> -------Original Message-------
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> From: Mark Van De Vyver
>> > >> >> >>
>> > >> >> >> Date: 02/03/2007 08:18:11
>> > >> >> >>
>> > >> >> >> To: Steve
>> > >> >> >>
>> > >> >> >> Subject: Re: [Pvfs2-users] PVFS 2.6.2 intermittent cmp/diff
>> > >> >> failure
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> Hi Steve,
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> > Not sure if this helps any but I have copied over 500gb of
>> > >> media
>> > >> >> >> files to
>> > >> >> >>
>> > >> >> >> > pvfs2 running on old dell's 533 to 866 CPU with very
>> > >> little ram
>> > >> >> >> running on
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> > caos3 beta 3. Although I havent done any checks other than
>> > >> >> using the
>> > >> >> >> media
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> > I havent noticed any problems.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> The failures might be spurious....?
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> > Could you have problems with the dvd device ?
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> I doubt it - but it may not be impossible?
>> > >> >> >>
>> > >> >> >> This happens with the DVD drives on all three nodes, and
>> when I
>> > >> >> just
>> > >> >> >>
>> > >> >> >> Have one node 'working the diif/cmp failures either don't
>> > >> occur or
>> > >> >> >>
>> > >> >> >> Very, very rarely. Start all three nodes 'working' and I see
>> > >> >> roughly
>> > >> >> >>
>> > >> >> >> 1 out of 2 binary files fail the initial diff/cmp check, but
>> > >> >> very very
>> > >> >> >>
>> > >> >> >> Few (one every couple of DVD's fail the cmp/diff check
>> > >> immediately
>> > >> >> >>
>> > >> >> >> After the copy is done.....
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> Thanks
>> > >> >> >>
>> > >> >> >> Mark
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > -------Original Message-------
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > From: Mark Van De Vyver
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Date: 02/03/2007 03:26:40
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > To: pvfs2-users at beowulf-underground.org
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Subject: [Pvfs2-users] PVFS 2.6.2 intermittent cmp/diff
>> > >> failure
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Hi,
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > This is a follow up on an earlier email where I reported
>> that
>> > >> >> PVFS
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > 1.5.1 failed copy binary files from several DVD's.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > I'm running a 3 node Rocks 4.2.1 Cluster, CentOS4.4, x86_64,
>> > >> >> nodes are
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Connected via an unmanaged switch.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > I have reinstalled the Rocks Cluster (all nodes),
>> > >> including the
>> > >> >> >> PVFS2 Roll
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > The cluster is set up with the frontend as the metadaat
>> > >> >> server and the
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Other two nodes are PVFS2 I/O servers and clients. The /
>> > >> >> mnt.pvfs2
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Area is on a 3 disk RAID 0 partition formatted as ext3.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > After installing I ran the test steps in the "PVFS2 Quick
>> > >> Start
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Guide". The test steps ran without error.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > I upgraded to PVFS 2.6.2 on all nodes and re-ran the test
>> > >> >> steps, again
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > No errors or problems.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > I build PVFS 2.6.2 with the following:
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > ./configure --with-kernel=</path/to/kernel26/>
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > --enable-kernel-sendfile --prefix=/usr/local/pvfs2/
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Then type
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Make all
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Make kmod_install
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Make install
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > On each node I have a script that lists the files on the DVD
>> > >> >> disc
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Loaded on that node.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Each file is copied if it does not exist on the HDD (PVFS
>> > >> >> area) and
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > The copy is immediately verified:
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Cp /dvd/file1 /mnt/pvfs2/file1
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Cmp /dvd/file1 /mnt/pvfs2/file1
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > `cmp` does not report any error.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > This has been done for 60-70 DVD.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > If I insert a DVD that has previously been copied my script
>> > >> >> finds that
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > A file exists in the PVFS area and does a `cmp` with the DVD
>> > >> >> file, if
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > The file fails this comparison the file is deleted, copied,
>> > >> >> verified
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > (cmp).
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > I notice that frequently and randomly the previously copied
>> > >> >> files will
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Fail the _initial_ `cmp` check if more than one node is
>> > >> >> 'active', I.e.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Processing a DVD.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Once deleted and copied the second `cmp` check is passed.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Some details:
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > The files do not fail the `cmp` check immediately after
>> being
>> > >> >> copied -
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Only when checking a previously copied file.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > The `cmp` result indicates a different byte at which the
>> > >> >> files differ.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Re-inserting the same dvd several times results if different
>> > >> >> files
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Failing the first `cmp` check.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > The second check (immediately after the copy is finished) is
>> > >> >> always
>> > >> >> >> passed
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > This occurs rarely, if at all (I.e. I haven't noticed it),
>> > >> >> when only
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > One node is processing a DVD.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > This only occurs with binary files - which are relatively
>> > >> large
>> > >> >> >> 200MB - 2
>> > >> >> >> GB
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > This never occurs with text files - which are also small
>> > >> 100'sKB
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > The pvfs2-client.log file is empty on each node.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > I have tried using diff and experience the same results.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > This is similar to an error I was seeing in PVFS 1.5.1 -
>> > >> >> hence the
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Upgrade. I've also changed my previous script which `dd`
>> > >> >> copied the
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > DVD to memory (approx 8GB), then wrote this ISO file to the
>> > >> >> PVFS2 area
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > - this worked fine for initial copies, but failed for re-
>> > >> >> copies. At
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > That time I wasn't verifiying the copy, so it was the
>> copy to
>> > >> >> the
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > PVFS2 area that failed.....
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Finally, on one occasion when manually running `cmp` on a
>> > >> file I
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Noticed the following sequence.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Cmp file1 file2 (pass)
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Cmp file1 file2 (pass)
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Difffile1 file2 (fail)
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Cmp file1 file2 (fail)
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Is this known behavior with a known workaround/configuration
>> > >> >> setting?
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > The behavior I see made me guess a caching or network issue
>> > >> >> (there are
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > No other machines on the cluster network).
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Can anyone suggest PVFS configuration settings that will
>> make
>> > >> >> PVFS more
>> > >> >> >>
>> > >> >> >> > robust.
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > I'm not a programmer or linux guru - I just spent this
>> summer
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Converting from winxp...
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > I'm happy to explore some possible fixes, but don't assume
>> > >> >> too much :)
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Thanks in advance
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Mark
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > _______________________________________________
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Pvfs2-users mailing list
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > Pvfs2-users at beowulf-underground.org
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-
>> > >> users
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> > _______________________________________________
>> > >> >> > Pvfs2-users mailing list
>> > >> >> > Pvfs2-users at beowulf-underground.org
>> > >> >> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>> > >> >> >
>> > >> >>
>> > >> >> <copy-taq-dvd-monitor.sh>
>> > >> >> <copy-taq-dvd.sh>
>> > >> >> <pvfs2-client.log.frontend>
>> > >> >> <pvfs2-client.log.compute-0-0>
>> > >> >> <pvfs2-client.log.compute-0-1>
>> > >> > _______________________________________________
>> > >> > Pvfs2-users mailing list
>> > >> > Pvfs2-users at beowulf-underground.org
>> > >> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>> > >>
>> > >> <pvfs2-fs.conf>
>> > >> <pvfs2-server.conf-frontend>
>> > >> <pvfs2-server.conf-pvfs2-compute-0-0>
>> > >> <pvfs2-server.conf-pvfs2-compute-0-1>
>> >
>>
More information about the Pvfs2-users
mailing list