[PVFS-users] setup_buffer() failure 8288 error on write pvfsdev: cannot allocate memory

milfeld@tacc.utexas.edu milfeld@tacc.utexas.edu
Mon, 26 Jan 2004 23:22:13 -0600 (CST)


Hi,

I'm having a problem with our newly installed PVFS, and if I cannot resolve it.

We have installed the pvfs-kernel interface 1.5.6 on our 600 processor system (lonestar):

   1.) The kernel version is (uname -a):
       Linux <hostname> 2.4.18-27.7.xsmp #1 SMP Thu Jan 8 13:16:14 CST 2004 i686 unknown
       It contains perfcrt patches for PAPI.

   2.) CC is the gcc compiler.

   3.) modules.config contains:
          alias char-major-60 pvfs          
          options pvfs buffer=mapped

   4.) "mknod /dev/pvfsd c 60 0" is used to create the pvfsd device file.

   5.) There are 15 dual-xeon systems running dual iod's, and the master
       node is on the login node.

   6.) A GigE switch hierarchy uses Dell PowerConnect 5224's aggregated through 3248's
       to form the network for PVFS between the compute nodes and the I/O server nodes.

Problem:
On a simple MPI-IO test, using 16-64 clients I get an I/O error 8288 (sometime many) back from
the MPI-IO write. (The code runs fine on our IBM GPFS system; and I've included
it at the end.)  On the nodes where the IO error occurs we get setup_buffer and memory errors 
reported below in the /var/log/kern log.

       1.) Below, I have reproduced the error three times on the same node. (I performed
           a "./pvfsd restart" in /etc/init.d after each failure, and resubmitted the job
           using a different file for writing.)  Sometimes the mount point hangs after
           this error.

       2.) I noticed the 8288 in previous  interactions, but that number appears not to
           point to a definitive problem.  

       3.) I have a feeling that the problem is caused by the "setup_buffer" failure, 
           and that it might be caused when using the  "options pvfs buffer=mapped" 
           option above.  Also, I found a previous history for this problem at:

           http://www.beowulf-underground.org/pipermail/pvfs-developers/2003-January/001107.html

           and in a correspondence between Murali Vilayannur and  Nathan Poznick

           >> Thus spake Murali Vilayannur:
           >>> Oh. The process seems to have been stuck in setup_buffer(), when it
           >>> received the signal? What was the command line options to insmod pvfs.o?
           >>> Could you also try with insmod pvfs.o buffer=3Dstatic and see if
           >>> the same thing happens?=20

           >>We were using buffer=3Ddynamic  (can't use mapped since we use
           >>CONFIG_HIGHMEM4G, and actually have 4G in the clients).  I switched it
           >>to buffer=3Dstatic, re-ran the problem job, and bingo!  It finishes without
           >>any errors.  We're running the job again now to make sure (but this is
           >>the first time it's actually finished), and after that, I'm going to
           >>switch back to kpvfsd with buffer=3Dstatic and run it once more.


The Question:
    Is there a fix for this problem?
    If so, can it be used for this version (possibly by not using a "mapped" buffer?).
    It looks as though we should try a different buffer setting.  What value should be used
    (3Dstatic, static, etc.)?
    Do we need to upgrade to fix the problem?


Thank you for your help; and more importantly, I thank all who have made PVFS a success!!!!!

Kent Milfeld
Texas Advanced Computing Center
          




-------------------------------- /var/log/kern --------------------------------------------
Jan 22 11:25:40 compute-3-22 kernel: (pvfsdev.c, 1356): pvfsdev: could not allocate memory!
Jan 22 11:25:40 compute-3-22 kernel: (pvfsdev.c, 1032): pvfsdev: setup_buffer() failure.
Jan 22 11:25:40 compute-3-22 kernel: (pvfsdev.c, 1356): pvfsdev: could not allocate memory!
Jan 22 11:25:40 compute-3-22 kernel: (pvfsdev.c, 1032): pvfsdev: setup_buffer() failure.
Jan 22 11:25:40 compute-3-22 kernel: (ll_pvfs.c, 553): ll_pvfs_file_write failed on 2600340
Jan 22 11:25:40 compute-3-22 kernel: (ll_pvfs.c, 553): ll_pvfs_file_write failed on 2600340
Jan 26 22:12:21 compute-3-22 kernel: (pvfsdev.c, 1356): pvfsdev: could not allocate memory!
Jan 26 22:12:21 compute-3-22 kernel: (pvfsdev.c, 1032): pvfsdev: setup_buffer() failure.
Jan 26 22:12:21 compute-3-22 kernel: (ll_pvfs.c, 553): ll_pvfs_file_write failed on 2600620
Jan 26 22:17:09 compute-3-22 kernel: (pvfsdev.c, 1032): pvfsdev: setup_buffer() failure.
Jan 26 22:17:09 compute-3-22 kernel: (ll_pvfs.c, 553): ll_pvfs_file_write failed on 2600620
Jan 26 22:17:57 compute-3-22 kernel: (pvfsdev.c, 1356): pvfsdev: could not allocate memory!
Jan 26 22:17:57 compute-3-22 kernel: (pvfsdev.c, 1032): pvfsdev: setup_buffer() failure.
Jan 26 22:17:57 compute-3-22 kernel: (ll_pvfs.c, 553): ll_pvfs_file_write failed on 2600621
Jan 26 22:21:18 compute-3-22 kernel: (pvfsdev.c, 1032): pvfsdev: setup_buffer() failure.
Jan 26 22:21:18 compute-3-22 kernel: (ll_pvfs.c, 553): ll_pvfs_file_write failed on 2600621
Jan 26 22:22:21 compute-3-22 kernel: (pvfsdev.c, 1356): pvfsdev: could not allocate memory!
Jan 26 22:22:21 compute-3-22 kernel: (pvfsdev.c, 1032): pvfsdev: setup_buffer() failure.
Jan 26 22:22:21 compute-3-22 kernel: (ll_pvfs.c, 553): ll_pvfs_file_write failed on 2600622
[root@compute-3-22 log]# 


-------------------------------- ipcs --------------------------------------------
[root@compute-3-22 log]# ipcs

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x00000000 0          root      600        1056768    9          dest         
0x00000000 32769      root      600        33554432   9          dest         
0x00000000 163842     root      600        33554432   9          dest         
0x00000000 98307      root      600        46084      9          dest         

------ Semaphore Arrays --------
key        semid      owner      perms      nsems      status      
0x00000000 65536      apache    600        1         

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages    



-------------------------------- top on compute-3-22  --------------------------------------------

 10:28pm  up 13 days,  7:19,  1 user,  load average: 1.06, 0.92, 0.57
70 processes: 66 sleeping, 2 running, 2 zombie, 0 stopped
CPU0 states:  0.0% user,  0.1% system,  0.0% nice, 99.4% idle
CPU1 states: 100.0% user,  0.0% system,  0.0% nice,  0.0% idle
Mem:  2064716K av,  678452K used, 1386264K free,       0K shrd,   95004K buff
Swap: 2096440K av,    4972K used, 2091468K free                  371792K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
10490 milfeld   25   0 39836  38M 18372 R    99.9  1.9   5:53 a.out9

-------------------------------- a.out9 (dual processor) compute-3-22 --------------------------------

[root@compute-3-22 log]# ps -elf | grep a.out9
000 R milfeld  10490 10489 99  85   0    - 12921 -      22:22 ?        00:07:49 ./a.out9
000 S milfeld  10499 10498  0  77   0    - 12919 end    22:22 ?        00:00:00 ./a.out9


-------------------------------- code --------------------------------
module comm_mpi
   include 'mpif.h'
   integer                            :: ntasks, irank, ierr_mpi 
   integer                            :: namelen
   character(32)                      :: name
end module

module comm_mpi_io
    integer, parameter                 :: NTIMES=1
    integer, parameter                 :: NBYTES=1024*1024*16
    real*8, dimension(NBYTES/8)        :: buf
    real*8                             :: bcalc
end module

module comm
    integer, parameter                 :: KI8=selected_int_kind(12)
end module


program block_sequence
   
   use comm
   use comm_mpi
   use comm_mpi_io

   implicit none

!  character(27)          :: filename="/rmount/gpfs15/milfeld/test"
!                                      123456789012345678901234567901234567890
!  character(23)          :: filename="/mnt/pvfs/milfeld/test3"
   character(15)          :: filename="/mnt/pvfs/test9"

   integer     :: nargs, iargc, length_fn
   integer     :: i, knt, j

   integer                          :: fh, status(MPI_STATUS_SIZE)
   integer(kind=KI8)                :: offset

   integer :: istart,iend,itar0,itar1,irate
   real*8  :: tdiff,fmbps


                        !MPI Setup

   call mpi_init(ierr_mpi)
   call mpi_comm_size(MPI_COMM_WORLD, ntasks, ierr_mpi)
   call mpi_comm_rank(MPI_COMM_WORLD,  irank, ierr_mpi)
   call MPI_Get_processor_name(name, namelen, ierr_mpi);

   print*,"Checking in from task",irank
      offset = irank*NBYTES

      call system_clock(count_rate=irate)
      call system_clock(count=itar0)
      call system_clock(count=itar1)

      call system_clock(count=istart)

     do j = 1,NBYTES/8
        buf(j) = j + irank*(NBYTES/8)
     end do

      call system("hostname; ls -l /mnt/pvfs")
      do i=1, NTIMES
   
         print*, "RANK ",irank,"   i=",i,"   Opening  ", filename
   
                        ! MPI_MODE_CREATE+MPI_MODE_RDWR = 8 + 4
   
         call MPI_FILE_OPEN(MPI_COMM_WORLD, filename,                     &
     &                      MPI_MODE_CREATE+MPI_MODE_RDWR, MPI_INFO_NULL, &
     &                      fh, ierr_mpi)

         print*,"OPEN IOERR=",ierr_mpi
   
         call MPI_FILE_SEEK( fh, offset,  MPI_SEEK_SET,         ierr_mpi)
         print*,"SEEK IOERR=",ierr_mpi
         call MPI_FILE_WRITE(fh, buf, NBYTES, MPI_BYTE, status, ierr_mpi)
         print*,"WRITE IOERR=",ierr_mpi,"   host=",name
   
         call MPI_FILE_CLOSE(fh, ierr_mpi)
         print*,"CLOSE IOERR=",ierr_mpi
    
   
         call MPI_BARRIER(MPI_COMM_WORLD, ierr_mpi)

     do j = 1,NBYTES/8
        buf(j) = 0.0D0
     end do
   
     
    
!        call MPI_FILE_OPEN(MPI_COMM_WORLD, filename,                    &
!    &                                     MPI_MODE_RDWR, MPI_INFO_NULL, &
!    &                     fh, ierr_mpi)
!  
!        call MPI_FILE_SEEK(fh, offset,  MPI_SEEK_SET,         ierr_mpi)
!        call MPI_FILE_READ(fh, buf, NBYTES, MPI_BYTE, status, ierr_mpi)

!    do j = 1,NBYTES/8
!       bcalc = j + irank*(NBYTES/8) 
!       
!       if( buf(j) .ne. bcalc ) then
!          print*, "task=",irank,"  diff rear/calc =",buf(j),bcalc
!          stop
!       end if
!    end do
!  
!        call MPI_FILE_CLOSE(fh, ierr_mpi)
   

         call MPI_BARRIER(MPI_COMM_WORLD, ierr_mpi)
   
      end do
   
   call system_clock(count=iend)

   if(irank .eq. 0) then
      tdiff = real(iend-istart - (itar1-itar0))/real(irate)
      fmbps = NTIMES*(NBYTES/(1024*1024))*ntasks/tdiff
      print*,' time = ',tdiff, '  MB/sec =',fmbps,'  for ',ntasks,' tasks.'
   endif



   call MPI_Finalize(ierr_mpi) 

   
end program

! not used here********************VVVV
subroutine get_cmdln_filename(length_fn,max_length,filename)
   implicit none
   character(*)   :: filename
   integer        :: length_fn,max_length

   external          length_string
   integer        :: length_string
   integer        :: i, nargs, iargc


      
      do i = 1,max_length
         filename(i:i) = " "
      end do

      nargs = iargc()
      if(nargs .lt. 1  .or.  nargs .gt. 2) then
         print*,"ERROR: SYNTAX: <prog> [-filename <file> | <filename ]"
         stop
      endif
     
      do i = 1,nargs
         call getarg(i,filename)
      enddo

      length_fn = length_string(max_length, 1,filename)

      print*,"length=",length_fn,"  filename.",filename(1:length_fn),"."


end subroutine


integer function length_string(max_len,idirection,string)
!
!---------------------------------------------------------------------------
!     Return string length.
!
! INPUT : 
!          max_len    Defined length of character string
!          ibackward  
!                     -1 = String ends at first non-blank in backward search
!                      1 = String ends at first non-blank in forward search.
!          string     String to be search.
!---------------------------------------------------------------------------

   implicit none
   character(*)   :: string
   integer        :: max_len,idirection
   integer        :: i, istart, iend

   if(idirection .ne. 1 .and. idirection .ne. -1) then
      print*, "ERROR: in function length_string: idirection .ne. 1 or -1" 
      stop "ERROR: length_string"
   endif

   if(idirection .eq. 1) then
      istart = 1
      iend   = max_len
      length_string = -1 !max_len
   else
      istart = max_len
      iend   = 1
      length_string = -1 !1
   endif

!  Search for first     " " character on forward  search
!  Search for first non-" " character on backward search

   do i = istart, iend, idirection

      if( idirection .eq. 1) then 
         if(string(i:i).eq.' ')then
            length_string = i-1
            return
         end if
      end if 

      if( idirection .eq.-1) then 
         if(string(i:i).ne.' ')then
            length_string = i
            return
         end if
      end if 


   end do

   return

end function