[Pvfs2-users] PVFS 2.6.2 intermittent cmp/diff failure
Mark Van De Vyver
mvyver at gmail.com
Fri Mar 2 21:17:11 EST 2007
Hi Steve
> I will try the script in the meantime I have tried to hammer it this
> afternoon copying a 500meg ISO and repeatedly doing cmp, I saw no errors.
> Also it came to me that when I copied the 500gb in I used a mirroring
> application which would have highlighted any bad copies as files updating, I
> saw none.
I can also see no errors if I just have one machine copying/verifying
to the PVFS2 area. Is your error free run from a case when several
machine are accessing/writing to the one PVFS2 area?
Regards
Mark
>
>
>
>
>
> -------Original Message-------
>
>
>
> From: Mark Van De Vyver
>
> Date: 02/03/2007 19:17:40
>
> To: Steve
>
> Cc: pvfs2-users at beowulf-underground.org
>
> Subject: Re: [Pvfs2-users] PVFS 2.6.2 intermittent cmp/diff failure
>
>
>
> Hi Steve,
>
> I don't have access to the cluster now, but the following script has a
>
> Few fixes.
>
> I haven't yet tested copying from a non-pvfs area to pvfs with pvfs 2.6.2.
>
> I saw something similar in pvfs 1.5.1 when copyinf from a tmpfs area to
> pvfs2.
>
> Running `mount` should show you if the dvd is auto-mounted and under
>
> What directory, in which case my mount below is redundant and you'll
>
> Need to replace the '/media/cdrom/' references.
>
>
>
> # untested script start
>
> Mkdir /media/cdrom
>
> # you may have to insert your systems dev name here
>
> Mount /dev/hdb /media/cdrom
>
>
>
> For fn in `ls /media/cdrom/*.*|sed -e 'S/\/media\/cdrom\///G`
>
> Do
>
> If [ -f "/mnt/pvfs2/${fn}" ]
>
> Then
>
> # This should 'fail' more frequently than the cmp in the else clause
>
> Cmp /media/cdrom/${fn} /mnt/pvfs2/${fn}
>
> If [ $? != 0 ]
>
> Then
>
> Echo "Prexisting copy not exact - more frequent and random?"
>
> If
>
> Else
>
> Cp /media/cdrom/${fn} /mnt/pvfs2/${fn}
>
> Cmp /media/cdrom/${fn} /mnt/pvfs2/${fn}
>
> If [ $? != 0 ]
>
> Then
>
> Echo " Initial copy not exact - less frequent and random"
>
> If
>
> If
>
> Done
>
> # untested script end
>
>
>
> Thanks
>
> Mark
>
>
>
> On 3/2/07, Steve <steve at bov.nu> wrote:
>
> > Well I thought id try manual cp
>
> >
>
> >
>
> >
>
> > I never mounted a dvd under link only cdrom. I mounted a movie dvd and get
>
>
> > an I/O error when trying to copy. I mounted a data dvd burned under
> windows
>
> > and the mount fails as wrong filesystem.
>
> >
>
> >
>
> >
>
> > Whats your mount command syntax ?
>
> >
>
> >
>
> >
>
> > BTW do you get the same if you copy your files to local non pvfs2 disk and
>
>
> > then use your script ?
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > -------Original Message-------
>
> >
>
> >
>
> >
>
> > From: Mark Van De Vyver
>
> >
>
> > Date: 02/03/2007 09:40:30
>
> >
>
> > To: Steve
>
> >
>
> > Cc: pvfs2-users at beowulf-underground.org
>
> >
>
> > Subject: Re: [Pvfs2-users] PVFS 2.6.2 intermittent cmp/diff failure
>
> >
>
> >
>
> >
>
> > Thanks Steve,
>
> >
>
> > I don't see any problem until I run the diff or cmp and even then
>
> >
>
> > These indicate the files are identical if the cmp is run _immediately_
>
> >
>
> > After the file copy.
>
> >
>
> > Cmp and diff only indicate a difference when a file is 'checked' after
>
> >
>
> > Some other files have been copied-checked.
>
> >
>
> >
>
> >
>
> > The files are from the NYSE trade and quote (TAQ) DVD's, so they are
>
> >
>
> > Text stored as binary.
>
> >
>
> >
>
> >
>
> > You might be able to try the following with a dozen or so large binary
>
> >
>
> > Files, I have approx 300-400GB stored in the PVFS area.
>
> >
>
> >
>
> >
>
> > Ideally the following should be run on two or more PVFS2 servers at
>
> >
>
> > The same time, apply this to several DVD's that have not been copied
>
> >
>
> > To the PVFS area, then reapply the script to the same DVD's after they
>
> >
>
> > Have been copied.
>
> >
>
> > The following is a slightly simplified version of my script - here I
>
> >
>
> > Don't delete and re-copy when an existing file fails the cmp
>
> >
>
> > Verification:
>
> >
>
> >
>
> >
>
> > # untested script start
>
> >
>
> > For fn in `ls /dvd/*large.bin|sed -e 'S/\/dev\//G`
>
> >
>
> > Do
>
> >
>
> > If [ -f /mnt/pvfs2/${fn} ]
>
> >
>
> > Then
>
> >
>
> > # This should 'fail' more frequently than the cmp in the else clause
>
> >
>
> > Cmp ${fn} /mnt/pvfs2/${fn}
>
> >
>
> > If [ $? != 0 ]
>
> >
>
> > Then
>
> >
>
> > Echo "Prexisting copy not exact - more frequent and random?"
>
> >
>
> > If
>
> >
>
> > Else
>
> >
>
> > Cp ${fn} /mnt/pvfs2/${fn}
>
> >
>
> > Cmp ${fn} /mnt/pvfs2/${fn}
>
> >
>
> > If [ $? != 0 ]
>
> >
>
> > Then
>
> >
>
> > Echo " Initial copy not exact - less frequent and random"
>
> >
>
> > If
>
> >
>
> > Done
>
> >
>
> > # untested script end
>
> >
>
> >
>
> >
>
> > Regards
>
> >
>
> > Mark
>
> >
>
> >
>
> >
>
> > On 3/2/07, Steve <steve at bov.nu> wrote:
>
> >
>
> > > My setup is a little different in that at the moment I have 2 I/O
> services
>
> >
>
> >
>
> > > running on one box, a metadata on another and a client/samba server on a
>
>
> >
>
> > > third. I have moved in the data via samba. We have copied in mp3's and
>
> >
>
> > > avi/mpg's as well as large ISO's plus software exe's. Surely after
> several
>
> >
>
> >
>
> > > week of use we would notice some problem ?
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > I do have another box set up as a client that happens to have a dvd ROM
>
> >
>
> > > drive in it.
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > What type of files ? A vob ?
>
> >
>
> > >
>
> >
>
> > > What sequence of commands would I need to do you test your problem ?
>
> >
>
> > >
>
> >
>
> > > If I get a little spare time I could try for U ?
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > Steve
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > -------Original Message-------
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > From: Mark Van De Vyver
>
> >
>
> > >
>
> >
>
> > > Date: 02/03/2007 08:18:11
>
> >
>
> > >
>
> >
>
> > > To: Steve
>
> >
>
> > >
>
> >
>
> > > Subject: Re: [Pvfs2-users] PVFS 2.6.2 intermittent cmp/diff failure
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > Hi Steve,
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > > Not sure if this helps any but I have copied over 500gb of media files
>
>
> > to
>
> >
>
> > >
>
> >
>
> > > > pvfs2 running on old dell's 533 to 866 CPU with very little ram
> running
>
> > on
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > > caos3 beta 3. Although I havent done any checks other than using the
>
> > media
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > > I havent noticed any problems.
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > The failures might be spurious....?
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > > Could you have problems with the dvd device ?
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > I doubt it - but it may not be impossible?
>
> >
>
> > >
>
> >
>
> > > This happens with the DVD drives on all three nodes, and when I just
>
> >
>
> > >
>
> >
>
> > > Have one node 'working the diif/cmp failures either don't occur or
>
> >
>
> > >
>
> >
>
> > > Very, very rarely. Start all three nodes 'working' and I see roughly
>
> >
>
> > >
>
> >
>
> > > 1 out of 2 binary files fail the initial diff/cmp check, but very very
>
> >
>
> > >
>
> >
>
> > > Few (one every couple of DVD's fail the cmp/diff check immediately
>
> >
>
> > >
>
> >
>
> > > After the copy is done.....
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > Thanks
>
> >
>
> > >
>
> >
>
> > > Mark
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > -------Original Message-------
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > From: Mark Van De Vyver
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Date: 02/03/2007 03:26:40
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > To: pvfs2-users at beowulf-underground.org
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Subject: [Pvfs2-users] PVFS 2.6.2 intermittent cmp/diff failure
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Hi,
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > This is a follow up on an earlier email where I reported that PVFS
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > 1.5.1 failed copy binary files from several DVD's.
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > I'm running a 3 node Rocks 4.2.1 Cluster, CentOS4.4, x86_64, nodes are
>
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Connected via an unmanaged switch.
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > I have reinstalled the Rocks Cluster (all nodes), including the PVFS2
>
> > Roll
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > The cluster is set up with the frontend as the metadaat server and the
>
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Other two nodes are PVFS2 I/O servers and clients. The /mnt.pvfs2
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Area is on a 3 disk RAID 0 partition formatted as ext3.
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > After installing I ran the test steps in the "PVFS2 Quick Start
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Guide". The test steps ran without error.
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > I upgraded to PVFS 2.6.2 on all nodes and re-ran the test steps, again
>
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > No errors or problems.
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > I build PVFS 2.6.2 with the following:
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > ./configure --with-kernel=</path/to/kernel26/>
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > --enable-kernel-sendfile --prefix=/usr/local/pvfs2/
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Then type
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Make all
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Make kmod_install
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Make install
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > On each node I have a script that lists the files on the DVD disc
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Loaded on that node.
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Each file is copied if it does not exist on the HDD (PVFS area) and
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > The copy is immediately verified:
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Cp /dvd/file1 /mnt/pvfs2/file1
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Cmp /dvd/file1 /mnt/pvfs2/file1
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > `cmp` does not report any error.
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > This has been done for 60-70 DVD.
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > If I insert a DVD that has previously been copied my script finds that
>
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > A file exists in the PVFS area and does a `cmp` with the DVD file, if
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > The file fails this comparison the file is deleted, copied, verified
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > (cmp).
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > I notice that frequently and randomly the previously copied files will
>
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Fail the _initial_ `cmp` check if more than one node is 'active', I.e.
>
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Processing a DVD.
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Once deleted and copied the second `cmp` check is passed.
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Some details:
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > The files do not fail the `cmp` check immediately after being copied -
>
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Only when checking a previously copied file.
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > The `cmp` result indicates a different byte at which the files differ.
>
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Re-inserting the same dvd several times results if different files
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Failing the first `cmp` check.
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > The second check (immediately after the copy is finished) is always
>
> > passed
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > This occurs rarely, if at all (I.e. I haven't noticed it), when only
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > One node is processing a DVD.
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > This only occurs with binary files - which are relatively large 200MB
> -
>
> > 2
>
> >
>
> > > GB
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > This never occurs with text files - which are also small 100'sKB
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > The pvfs2-client.log file is empty on each node.
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > I have tried using diff and experience the same results.
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > This is similar to an error I was seeing in PVFS 1.5.1 - hence the
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Upgrade. I've also changed my previous script which `dd` copied the
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > DVD to memory (approx 8GB), then wrote this ISO file to the PVFS2 area
>
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > - this worked fine for initial copies, but failed for re-copies. At
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > That time I wasn't verifiying the copy, so it was the copy to the
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > PVFS2 area that failed.....
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Finally, on one occasion when manually running `cmp` on a file I
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Noticed the following sequence.
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Cmp file1 file2 (pass)
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Cmp file1 file2 (pass)
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Difffile1 file2 (fail)
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Cmp file1 file2 (fail)
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Is this known behavior with a known workaround/configuration setting?
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > The behavior I see made me guess a caching or network issue (there are
>
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > No other machines on the cluster network).
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Can anyone suggest PVFS configuration settings that will make PVFS
> more
>
> >
>
> > >
>
> >
>
> > > > robust.
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > I'm not a programmer or linux guru - I just spent this summer
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Converting from winxp...
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > I'm happy to explore some possible fixes, but don't assume too much :)
>
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Thanks in advance
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Mark
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > _______________________________________________
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Pvfs2-users mailing list
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > Pvfs2-users at beowulf-underground.org
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> > >
>
> >
>
> >
>
> >
>
>
>
More information about the Pvfs2-users
mailing list