[PVFS-users] pvfs mgr crash

Franco M. Bladilo bladilo at rice.edu
Tue Jun 28 11:19:07 EDT 2005


After working flawlessly for almost 3 months we are having problems with 
the pvfs mgr, this is  the output on mgr.log :

[I 06/27 16:17] ----- Log Level Changing -----
[I 06/27 16:17] Current Logging Level includes :
[I 06/27 16:17] New     Logging Level includes : CRITICAL  WARNING
[C 06/27 16:18] (mgr.c,376) socket=[5] closed
[C 06/27 16:18] (mgr.c,376) socket=[5] closed
[C 06/27 16:18] (mgr.c,376) socket=[5] closed
[C 06/27 16:18] (mgr.c,376) socket=[5] closed
[C 06/27 16:18] (metaio.c,88) meta_open: Metadata file 
/pvfs-meta/epilogue_41172.management.log is not the correct size.  This 
is usually due to running a newer mgr on an old PVFS file system or 
someone mucking with the files in the metadata directory (which they 
should not do).  Aborting!
[C 06/27 16:18] (md_stat.c,118) md_stat, meta_open
        errno     : [22]
        errno msg : [Invalid argument]
[C 06/27 16:18] (metaio.c,88) meta_open: Metadata file 
/pvfs-meta/epilogue_41260.management.log is not the correct size.  This 
is usually due to running a newer mgr on an old PVFS file system or 
someone mucking with the files in the metadata directory (which they 
should not do).  Aborting!
[C 06/27 16:18] (md_stat.c,118) md_stat, meta_open
        errno     : [22]
        errno msg : [Invalid argument]
[C 06/27 16:18] (metaio.c,88) meta_open: Metadata file 
/pvfs-meta/epilogue_41172.management.log is not the correct size.  This 
is usually due to running a newer mgr on an old PVFS file system or 
someone mucking with the files in the metadata directory (which they 
should not do).  Aborting!
[C 06/27 16:18] (md_stat.c,118) md_stat, meta_open
        errno     : [22]
        errno msg : [Invalid argument]
[C 06/27 16:18] (metaio.c,88) meta_open: Metadata file 
/pvfs-meta/epilogue_41260.management.log is not the correct size.  This 
is usually due to running a newer mgr on an old PVFS file system or 
someone mucking with the files in the metadata directory (which they 
should not do).  Aborting!
[C 06/27 16:18] (md_stat.c,118) md_stat, meta_open
        errno     : [22]
        errno msg : [Invalid argument]
[C 06/27 16:18] (metaio.c,88) meta_open: Metadata file 
/pvfs-meta/epilogue_41172.management.log is not the correct size.  This 
is usually due to running a newer mgr on an old PVFS file system or 
someone mucking with the files in the metadata directory (which they 
should not do).  Aborting!
[C 06/27 16:18] (md_stat.c,118) md_stat, meta_open
        errno     : [22]
        errno msg : [Invalid argument]
...
It continues with hundreds of file entries until it reaches this point 
and crashes:

[C 06/28 04:06] (metaio.c,88) meta_open: Metadata file 
/pvfs-meta/dtabakov/50-50-LF-c200-r0.2-2.5-by-0.1-f-0.1-1.0-by-0.1-initAcpt-crap/new-s-50-r-2.30-f-0.20--120-of-200 
is not the correct size.  This is usually due to running a newer mgr on 
an old PVFS file system or someone mucking with the files in the 
metadata directory (which they should not do).  Aborting!
[C 06/28 04:06] (md_stat.c,118) md_stat, meta_open
        errno     : [22]
        errno msg : [Invalid argument]
[C 06/28 04:06] (mgr.c,2576) Received signal=[11]
[C 06/28 04:06] (mgr.c,2578)
OPEN FILES:
[C 06/28 04:06] (mgr.c,2585) Current working directory: [/]
[C 06/28 04:06] (mgr.c,2587) pid: [30259]
[C 06/28 04:06] (mgr.c,2594) rlim_cur (RLIMIT_CORE): [0]
[C 06/28 04:06] (mgr.c,2595) rlim_max (RLIMIT_CORE): [-1]

After restarting the mgr , any operations on the pvfs mounted filesystem 
will complain about corrupted/non-existant files :

[root at io1 shared.scratch]# ls -la
ls: epilogue_41172.management.log: Invalid argument
ls: epilogue_41260.management.log: Invalid argument
total 308521
drwxrwxrwx    1 root     root        20480 Jun 28 10:13 .
drwxr-xr-x   24 root     root         4096 May 16 14:05 ..
drwxr-xr-x    1 juanp    scisim       4096 Jun 10 02:50 40045.management
drwxr-xr-x    1 juanp    scisim       4096 Jun 11 06:43 40046.management

Here's the log on the iods when the crash happened :

[root at io1 tmp]# cat iolog.0Vr60K

[I 06/27 16:18] ----- Log Level Changing -----
[I 06/27 16:18] Current Logging Level includes :
[I 06/27 16:18] New     Logging Level includes : CRITICAL  WARNING
[W 06/27 16:18] (iod.c,289) socket=[5] hung up
[W 06/27 16:18] (iod.c,289) socket=[5] hung up
[W 06/27 16:31] (iod.c,697)  open: 064/f49958.0 exists (flags = c2); saving
[W 06/27 17:25] (iod.c,697)  open: 066/f49960.0 exists (flags = c2); saving
[W 06/27 17:50] (iod.c,697)  open: 067/f49961.0 exists (flags = c2); saving
[W 06/27 18:15] (iod.c,697)  open: 069/f49963.0 exists (flags = c2); saving
[W 06/27 18:36] (iod.c,697)  open: 070/f49964.0 exists (flags = c2); saving
[W 06/27 18:41] (iod.c,697)  open: 072/f49966.0 exists (flags = c2); saving
[W 06/27 18:56] (iod.c,697)  open: 073/f49967.0 exists (flags = c2); saving
[W 06/27 19:01] (iod.c,697)  open: 074/f49968.0 exists (flags = c2); saving
[W 06/27 19:05] (iod.c,697)  open: 076/f49970.0 exists (flags = c2); saving
[W 06/28 04:06] (iod.c,289) socket=[5] hung up

There were no hardware failures and all clients,iods and mgr run the 
same pvfs version (1.6.3) on ia64 based system.

Any ideas?

Thanks in advance,

-- 
Franco Bladilo
Linux/HPCC Administrator
Research Computing Group
Rice University
bladilo at rice.edu




More information about the PVFS-users mailing list