[Pvfs2-users] BLAST failed to format database or fetch sequence
from database stored on PVFS2 filesystem
Yun He
jarod at spg.biosci.tsinghua.edu.cn
Fri Dec 7 23:53:39 EST 2007
Hi,
Yestoday I found I always got msg "[blastpgp] WARNING: [000.000] Failed to
initialize search. ISAM Error code is -5" when I run blastpgp against a
database which is stored on the parallel filesystem PVFS2, but the warning
did not occur when blast against database shared by NFS.
I set up a PVFS2 on a small cluster, the nodes are same: same CPU, same MEM.
And I have used BLAST 2.2.15 and the latest 2.2.17, both of the two versions
gave the warnings.
Here is an example:
1) /data/blastdb is a directory holding some database like nr, and this
directory is shared from the master node of a cluster to several compute
nodes by NFS filesystem;
2) /pool/blastdb is mount point of PVFS2 (version 2.6.3) filesystem on all
nodes, the content of this directory is identical to that of /data/blastdb (I
use rsync to make them identical);
3) I employed a small testset of about 100 sequences to test blastpgp against
nr database in both of the to directories. All runnings on /pool/blastdb
complained "[blastpgp] WARNING: [000.000] Failed to initialize search. ISAM
Error code is -5", but those on /data/blastdb did not;
4) It seems that BLAST failed to fetch some sequences from the database on
PVFS2 filesystem and make the complain; I use fastacmd to fetch some
sequence:
a) fetch from database on NFS, this is OK,
$ fastacmd -s "gi|34495614" -d /data/blastdb/nr
>gi|34495614|ref|NP_899829.1| sulfite dehydrogenase - subunitB
[Chromobacterium violaceum ATCC 12472] >gi|34101469|gb|AAQ57838.1| sulfite
dehydrogenase - subunitB [Chromobacterium violaceum ATCC 12472]
MRAALLALALLAAPAGAASIALPNETAMLPDSGHPGYQAALRRCLVCHSADYIALQPDFDEARWRAVVDKMRLAFKAPIP
AEEAAPIAAYLADAQRRRLLRPHPPQP
b) fetch from database on PVFS2, ohhh,
$ fastacmd -s "gi|34495614" -d /pool/blastdb/nr
[fastacmd] ERROR: Accesion search failed for "gi|34495614" with error code -5
And then, I used FORMATDB to format the nr database in
local disk and in PVFS2. The procedure was successful in local disk, but FAIL
in PVFS2.
This is the log file of FORMATDB in local disk with successful messages:
========================[ Dec 6, 2007 9:07 PM ]========================
Version 2.2.17 [Aug-26-2007]
Started database file "nr"
Closing volume nr with 2976302 sequences, 999,999,232 letters(.psq file =
1002976321 bytes; .phr file = 846550430 bytes)
Formatted 2976302 sequences in volume 0
Version 2.2.17 [Aug-26-2007]
Started database file "nr"
Formatted 2702180 sequences in volume 1
SUCCESS: formatted database nr
This is the log file of FORMATDB in PVFS2 complaining the errors:
========================[ Dec 6, 2007 9:28 PM ]========================
Version 2.2.17 [Aug-26-2007]
Started database file "nr"
Closing volume nr with 2976302 sequences, 999,999,232 letters(.psq file =
1002976321 bytes; .phr file = 846550430 bytes)
ERROR: [000.000] Failed to create index: ISAMErrorCode -5.
Removed single-volume database nr
FATAL ERROR: [001.000] Fatal error when adding sequence to BLAST database.
Why this happen?
There is a paper on 2002 (J.D Grant, et al, Bioinformatics 2002, 18(5):
765-766) said they had developed a distributed BLAST and PSI-BLAST on a
cluster and the database was really stored on PVFS.
Is PVFS2 suitable for storage?
--
Yun He Ph.D.
National Laboratory of Biomacromolecules
Institute of Biophysics, Chinese Academy of Sciences
15 Datun Road, Chaoyang District
Beijing 100101
China
Tel: +86 010 6488 8487
E-mail: jarod at nlbmol.ibp.ac.cn
or jarod at spg.biosci.tsinghua.edu.cn
More information about the Pvfs2-users
mailing list