[PVFS-users] setup_buffer() failure 8288 error on write pvfsdev: cannot allocate memory

Kent Milfeld milfeld@tacc.utexas.edu
Fri, 30 Jan 2004 16:49:04 -0600


Hi,
      On our first installation of pvfs we used "buffer=mapped",
      a simple pvfs test produced the memory problems below.
      After receiving advise not to used mapped (and after seeing
      some earlier problems at the archive site, we backed down on using
      the mapping.
      So, I unmounted the file system, stopped the client daemons, and
      removed the pvfs module.  Using the command
        insmod pvfs
      I obtained the defaults, dynamic and 16MB buffer, on the
      282 nodes. After restarting the daemons, and remounting /mnt/pvfs
      and running the test again (the same (MPI-IO) code to check
      each node, singly) I discovered that there were
      37 nodes out of 282 that I would still consistently get
      the memory (buffer) error (from the /var/log/kern file).  (These
      were probably nodes were somehow problematic happened when PVFS was
      running earlier in the "mapped" mode.)

      BY REBOOTING THOSE PROBLEM NODES, THE PROBLEM WENT AWAY.
      So I'm very happy about that!

      But it bothers me that I don't know what happened
      to the problem nodes. I did discover that if I wrote records that
      were below 1MB, they worked just fine, but if the write calls were
      sending 1MB or over, I would got the error. (See below.)

      I did not reboot two of the problem nodes so that I can possibly run
      some more tests to find out why.  I've looked at the slab information
      in /proc and found that on both the good nodes, and the bad ones, the
      pvfs space is about 18MB, and nothing else is using space 
      excessively.  IPCS shows two 32MB sections being used.  
      This appears on both good and bad nodes:

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status

0x00000000 0          root      600        1056768    9          dest

0x00000000 32769      root      600        33554432   9          dest

0x00000000 65538      root      600        33554432   9          dest

0x00000000 98307      root      600        46084      9          dest


      So I don't thing anything seems wrong with these segments. (Although I

      would like to find out later what is consuming 64MB of the shared
      memory space (we have 2GB on each node)-- might be mpich-gm.

      My question is:  
      Are there any utilities that I can use, or any log files I can check,
      or any experiments I can run to determine what has happened to the
      "problem node" memory to fail when a write needs memory buffer space?
      (I know that the mapped option probably caused the problem, but why 
       did it only occur on 37 nodes, not the rest? Any why cannot I find
       anything to indicate the problem? )

Thanks for your help, (Please don't waste time on this, because PVFS is
working fine with a reboot; if you have a quick answer or advise I would
appreciate your information.)

Kent
 

Jan 13 10:33:24 compute-10-17 kernel: pvfs: debug = 0x0, maxsz = 16777216
bytes, buffer = dynamic, major = 0
...
Jan 27 20:14:18 compute-10-17 kernel: (pvfsdev.c, 1356): pvfsdev: could not
allocate memory!
Jan 27 20:14:18 compute-10-17 kernel: (pvfsdev.c, 1032): pvfsdev:
setup_buffer() failure.
Jan 27 20:14:18 compute-10-17 kernel: (ll_pvfs.c, 553): ll_pvfs_file_write
failed on 2600628

Kent
Texas Advanced Computing Center
www.tacc.utexas.edu/general/staff
Please use consulting form at:  www.tacc.utexas.edu/consulting
 
>-----Original Message-----
>From: pvfs-users-admin@beowulf-underground.org [mailto:pvfs-users-
>admin@beowulf-underground.org] On Behalf Of Nathan Poznick
>Sent: Tuesday, January 27, 2004 12:09 AM
>To: pvfs-users@beowulf-underground.org
>Subject: Re: [PVFS-users] setup_buffer() failure 8288 error on write
>pvfsdev: cannot allocate memory
>
>Thus spake milfeld@tacc.utexas.edu:
>>
>> The Question:
>>     Is there a fix for this problem?
>>     If so, can it be used for this version (possibly by not using a
>>     "mapped" buffer?).
>
>I would certainly try using buffer=static, since the kernel you are
>using has CONFIG_HIGHMEM4G=y, and the documentation states that mapped
>is only available for machines without CONFIG_HIGHMEM[4|64]G enabled.
>If buffer=static doesn't work, there's always buffer=dynamic to fall
>back on as well.
>
>>     Do we need to upgrade to fix the problem?
>
>Perhaps not to fix this specific problem, but you should upgrade if
>possible to gain the many other fixes which have gone into the
>filesystem since 1.5.6 (of which there have been more than a few).
>
>
>--
>Nathan Poznick <poznick@conwaycorp.net>
>
>Modern Americans are so exposed, peered at, inquired about, and spied
>upon as to be increasingly without privacy-members of a naked society
>and denizens of a goldfish bowl. - Edward V. Long