[PVFS-users] pvfs mgr crash

Rob Ross rross at mcs.anl.gov
Tue Jun 28 13:49:14 EDT 2005


Hi Franco,

First, thanks for the very thorough bug report.

Typically those error messages mean exactly what they say: that someone 
has been messing with the metadata files.  It could also indicate 
something is amiss with that local file system.

It would be helpful for you to "ls" a few of those files and see what 
size they are.  The mgr uses a very simple check, and it is unlikely 
that it is incorrectly reporting that the files are somehow messed up.

If there is any data at all in the files, it would be helpful to see 
what it is.

It might be a good idea to run a fsck on the file system holding your 
metadata.  Something seems wrong there.  Is that just a single, local disk?

The iods are behaving as they should I think.  Let's concentrate on that 
mgr for now.

Regards,

Rob

Franco M. Bladilo wrote:
> After working flawlessly for almost 3 months we are having problems with 
> the pvfs mgr, this is  the output on mgr.log :
> 
> [I 06/27 16:17] ----- Log Level Changing -----
> [I 06/27 16:17] Current Logging Level includes :
> [I 06/27 16:17] New     Logging Level includes : CRITICAL  WARNING
> [C 06/27 16:18] (mgr.c,376) socket=[5] closed
> [C 06/27 16:18] (mgr.c,376) socket=[5] closed
> [C 06/27 16:18] (mgr.c,376) socket=[5] closed
> [C 06/27 16:18] (mgr.c,376) socket=[5] closed
> [C 06/27 16:18] (metaio.c,88) meta_open: Metadata file 
> /pvfs-meta/epilogue_41172.management.log is not the correct size.  This 
> is usually due to running a newer mgr on an old PVFS file system or 
> someone mucking with the files in the metadata directory (which they 
> should not do).  Aborting!
> [C 06/27 16:18] (md_stat.c,118) md_stat, meta_open
>        errno     : [22]
>        errno msg : [Invalid argument]
> [C 06/27 16:18] (metaio.c,88) meta_open: Metadata file 
> /pvfs-meta/epilogue_41260.management.log is not the correct size.  This 
> is usually due to running a newer mgr on an old PVFS file system or 
> someone mucking with the files in the metadata directory (which they 
> should not do).  Aborting!
> [C 06/27 16:18] (md_stat.c,118) md_stat, meta_open
>        errno     : [22]
>        errno msg : [Invalid argument]
> [C 06/27 16:18] (metaio.c,88) meta_open: Metadata file 
> /pvfs-meta/epilogue_41172.management.log is not the correct size.  This 
> is usually due to running a newer mgr on an old PVFS file system or 
> someone mucking with the files in the metadata directory (which they 
> should not do).  Aborting!
> [C 06/27 16:18] (md_stat.c,118) md_stat, meta_open
>        errno     : [22]
>        errno msg : [Invalid argument]
> [C 06/27 16:18] (metaio.c,88) meta_open: Metadata file 
> /pvfs-meta/epilogue_41260.management.log is not the correct size.  This 
> is usually due to running a newer mgr on an old PVFS file system or 
> someone mucking with the files in the metadata directory (which they 
> should not do).  Aborting!
> [C 06/27 16:18] (md_stat.c,118) md_stat, meta_open
>        errno     : [22]
>        errno msg : [Invalid argument]
> [C 06/27 16:18] (metaio.c,88) meta_open: Metadata file 
> /pvfs-meta/epilogue_41172.management.log is not the correct size.  This 
> is usually due to running a newer mgr on an old PVFS file system or 
> someone mucking with the files in the metadata directory (which they 
> should not do).  Aborting!
> [C 06/27 16:18] (md_stat.c,118) md_stat, meta_open
>        errno     : [22]
>        errno msg : [Invalid argument]
> ...
> It continues with hundreds of file entries until it reaches this point 
> and crashes:
> 
> [C 06/28 04:06] (metaio.c,88) meta_open: Metadata file 
> /pvfs-meta/dtabakov/50-50-LF-c200-r0.2-2.5-by-0.1-f-0.1-1.0-by-0.1-initAcpt-crap/new-s-50-r-2.30-f-0.20--120-of-200 
> is not the correct size.  This is usually due to running a newer mgr on 
> an old PVFS file system or someone mucking with the files in the 
> metadata directory (which they should not do).  Aborting!
> [C 06/28 04:06] (md_stat.c,118) md_stat, meta_open
>        errno     : [22]
>        errno msg : [Invalid argument]
> [C 06/28 04:06] (mgr.c,2576) Received signal=[11]
> [C 06/28 04:06] (mgr.c,2578)
> OPEN FILES:
> [C 06/28 04:06] (mgr.c,2585) Current working directory: [/]
> [C 06/28 04:06] (mgr.c,2587) pid: [30259]
> [C 06/28 04:06] (mgr.c,2594) rlim_cur (RLIMIT_CORE): [0]
> [C 06/28 04:06] (mgr.c,2595) rlim_max (RLIMIT_CORE): [-1]
> 
> After restarting the mgr , any operations on the pvfs mounted filesystem 
> will complain about corrupted/non-existant files :
> 
> [root at io1 shared.scratch]# ls -la
> ls: epilogue_41172.management.log: Invalid argument
> ls: epilogue_41260.management.log: Invalid argument
> total 308521
> drwxrwxrwx    1 root     root        20480 Jun 28 10:13 .
> drwxr-xr-x   24 root     root         4096 May 16 14:05 ..
> drwxr-xr-x    1 juanp    scisim       4096 Jun 10 02:50 40045.management
> drwxr-xr-x    1 juanp    scisim       4096 Jun 11 06:43 40046.management
> 
> Here's the log on the iods when the crash happened :
> 
> [root at io1 tmp]# cat iolog.0Vr60K
> 
> [I 06/27 16:18] ----- Log Level Changing -----
> [I 06/27 16:18] Current Logging Level includes :
> [I 06/27 16:18] New     Logging Level includes : CRITICAL  WARNING
> [W 06/27 16:18] (iod.c,289) socket=[5] hung up
> [W 06/27 16:18] (iod.c,289) socket=[5] hung up
> [W 06/27 16:31] (iod.c,697)  open: 064/f49958.0 exists (flags = c2); saving
> [W 06/27 17:25] (iod.c,697)  open: 066/f49960.0 exists (flags = c2); saving
> [W 06/27 17:50] (iod.c,697)  open: 067/f49961.0 exists (flags = c2); saving
> [W 06/27 18:15] (iod.c,697)  open: 069/f49963.0 exists (flags = c2); saving
> [W 06/27 18:36] (iod.c,697)  open: 070/f49964.0 exists (flags = c2); saving
> [W 06/27 18:41] (iod.c,697)  open: 072/f49966.0 exists (flags = c2); saving
> [W 06/27 18:56] (iod.c,697)  open: 073/f49967.0 exists (flags = c2); saving
> [W 06/27 19:01] (iod.c,697)  open: 074/f49968.0 exists (flags = c2); saving
> [W 06/27 19:05] (iod.c,697)  open: 076/f49970.0 exists (flags = c2); saving
> [W 06/28 04:06] (iod.c,289) socket=[5] hung up
> 
> There were no hardware failures and all clients,iods and mgr run the 
> same pvfs version (1.6.3) on ia64 based system.
> 
> Any ideas?
> 
> Thanks in advance,
> 


More information about the PVFS-users mailing list