Showing posts with label gpu. Show all posts
Showing posts with label gpu. Show all posts

09 May 2013

409.B.GAMESS US with GPU support on debian wheezy --the ACML edition. This works.


Update 27/6/2013:
Please note that Kirill Berezovsky has published a series of posts on GAMESS US, including how to compile it for both CPU and GPU use. See
http://biochemicalmatters.blogspot.com.au/2013/06/gamess-us-frequently-asked-questions_26.html
http://biochemicalmatters.blogspot.ru/2013/06/gamess-us-frequently-asked-questions_1687.html
http://biochemicalmatters.blogspot.ru/2013/06/gamess-us-frequently-asked-questions_1447.html
http://biochemicalmatters.blogspot.com.au/2013/06/gamess-us-frequently-asked-questions.html


Update 21 May 2013: See the comments below this post. This approach most likely works -- what has been confusing me is the lack of reports of GPU timings in the output, but this doesn't necessarily mean that the GPU isn't being used. The poster below, using nvidia-smi, observed GPU usage, although the speed-up was not major.

Blogspot needs versioning.
I lost the entire post when it was almost complete. Screw this.

Everything compiles fine, but no GPU output during calculation.

I see no evidence of the GPU being used at any stage.  Otherwise all is good -- the calcs run fine on the CPU.

Maybe someone else will have a better idea.

I looked at libcchem/aaa.readme.1st and http://combichem.blogspot.com.au/2011/02/compiling-gamess-with-cuda-gpu-support.html to get as far as I did.

Setting up gamess
Get gamess (see e.g. http://verahill.blogspot.com.au/2012/09/compiling-and-testing-gamess-us-on.html). Put gamess-current.tar.gz in ~/tmp

sudo apt-get install libboost-all-dev build-essential g++ gfortran automake nvidia-cuda-toolkit python-cheetah openmpi-bin libopenmpi-dev zlib1g-dev checkinstall
mkdir ~/tmp
cd ~/tmp
tar xvf gamess-current.tar.gz
sudo mv gamess /opt/gamess_cuda
sudo chown $USER:$USER /opt/gamess_cuda -R


ACML
Download both the 'regular' and the int64 gfortran packages from AMD:
http://developer.amd.com/tools-and-sdks/cpu-development/amd-core-math-library-acml/acml-downloads-resources/#download

tar xvf acml-5-3-1-gfortran-64bit-int64.tgz
tar xvf acml-5-3-1-gfortran-64bit.tgz
sh install-acml-5-3-1-gfortran-64bit-int64.sh
Where do you want to install ACML? Press return to use the default location (/opt/acml5.3.1), or enter an alternative path. The directory will be created if it does not already exist. > /opt/acml/acml5.3.1
sh install-acml-5-3-1-gfortran-64bit.sh
Where do you want to install ACML? Press return to use the default location (/opt/acml5.3.1), or enter an alternative path. The directory will be created if it does not already exist. > /opt/acml/acml5.3.1
You'll get something like this:
/opt/acml/acml5.3.1
|-- Doc
|-- gfortran64
|-- gfortran64_fma4
|-- gfortran64_fma4_int64
|-- gfortran64_fma4_mp
|-- gfortran64_fma4_mp_int64
|-- gfortran64_int64
|-- gfortran64_mp
|-- gfortran64_mp_int64
`-- util

where
*  fma4 is for cpus with FMA4 support (use util/cpuid to check)
*  int64 is for double-precision float (integer*8) I think
*  mp is for openmp. For MPI do not use the _mp_ libraries!

Pick your library/ies and add them to the LD_LIBRARY_PATH, e.g.:
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/acml/acml5.3.1/gfortran64_int64/lib' >> ~/.bashrc
source ~/.bashrc


CBLAS
cd /opt/netlib/
wget http://www.netlib.org/blas/blast-forum/cblas.tgz
tar xvf cblas.tgz
cd CBLAS/

Edit Makefile.LINUX
24 25 BLLIB = /opt/acml/acml5.3.1/gfortran64_int64/lib/libacml.a 26 CBLIB = ../lib/cblas_$(PLAT).a 27
cp Makefile.LINUX Makefile.in
make

patching libboost
sudo su
cd /usr/include/boost
patch -p1 < /opt/gamess_cuda/libcchem/boot/
exit

Make the following changes by hand if the patch didn't work:

/usr/include/boost/mpl/aux_/integral_wrapper.hpp
47 // other compilers (e.g. MSVC) are not particulary happy about it 48 #if BOOST_WORKAROUND(__EDG_VERSION__, <= 238) || defined(__CUDACC__) 49 typedef struct AUX_WRAPPER_NAME type;
/usr/include/boost/mpl/size_t_fwd.hpp
20 21 BOOST_MPL_AUX_ADL_BARRIER_NAMESPACE_OPEN 22 #if defined(__CUDACC__) 23 typedef std::size_t std_size_t; 24 template< std_size_t N > struct size_t; 25 #else 26 template< std::size_t N > struct size_t; 27 #endif 28 29 BOOST_MPL_AUX_ADL_BARRIER_NAMESPACE_CLOSE
/usr/include/boost/mpl/size_t.hpp
19 #if defined(__CUDACC__) 20 #define AUX_WRAPPER_VALUE_TYPE std_size_t 21 #define AUX_WRAPPER_NAME size_t 22 #define AUX_WRAPPER_PARAMS(N) std_size_t N 23 #else 24 #define AUX_WRAPPER_VALUE_TYPE std::size_t 25 #define AUX_WRAPPER_NAME size_t 26 #define AUX_WRAPPER_PARAMS(N) std::size_t N 27 #endif 28

HDF5
mkdir ~/tmp
cd ~/tmp
wget http://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.8.10-patch1.tar.gz
tar xvf hdf5-1.8.10-patch1.tar.gz
cd hdf5-1.8.10-patch1/
export CC=/usr/bin/gcc-4.6 && export CXX=/usr/bin/g++-4.6
./configure --prefix=/opt/gamess_cuda/hdf5 --with-pthread --enable-cxx --enable-threadsafe --enable-unsupported
make
mkdir /opt/gamess_cuda/hdf5/lib -p
mkdir /opt/gamess_cuda/hdf5/include -p
sudo checkinstall
This package will be built according to these values: 0 - Maintainer: [ root@neon ] 1 - Summary: [ hdf5-cxx] 2 - Name: [ hdf5-1.8.10 ] 3 - Version: [ 1.8.10-1 ] 4 - Release: [ 1 ] 5 - License: [ GPL ] 6 - Group: [ checkinstall ] 7 - Architecture: [ amd64 ] 8 - Source location: [ hdf5-1.8.10-patch1 ] 9 - Alternate source location: [ ] 10 - Requires: [ ] 11 - Provides: [ hdf5-1.8.10 ] 12 - Conflicts: [ ] 13 - Replaces: [ ]
Make sure to edit the Version field since Patch-1 leads to an error (must start with digit).

LIBCCHEM
Edit /opt/gamess_cuda/libcchem/src/externals/boost/cuda/device_ptr.hpp and /opt/gamess_cuda/libcchem/rysq/src/externals/boost/cuda/device_ptr.hpp. Insert
#include <stddef.h>
somewhere at the beginning of each file.

./configure --with-gamess --with-hdf5=/opt/gamess_cuda/hdf5 CPPFLAGS="-I/opt/gamess_cuda/hdf5/include" --with-cuda=/usr --disable-openmp --prefix=/opt/gamess_cuda/libcchem --with-gpu=fermi --with-integer8 --with-cublas
make
make install


Configure GAMESS US
cd /opt/gamess_cuda
./config
please enter your target machine name: linux64 GAMESS directory? [/opt/gamess_cuda] GAMESS build directory? [/opt/gamess_cuda] Version? [00] 12 Please enter your choice of FORTRAN: gfortran Please enter only the first decimal place, such as 4.1 or 4.6: 4.6 Enter your choice of 'mkl' or 'atlas' or 'acml' or 'none': acml enter this full pathname: /opt/acml/acml5.3.1 communication library ('sockets' or 'mpi')? mpi Enter MPI library (impi, mvapich2, mpt, sockets): openmpi Please enter your openmpi's location: /opt/openmpi/1.6

Compile
cd ddi/
./compddi
cd ..

Edit comp
872 # see ~/gamess/libcchem/aaa.readme.1st for more information 873 set GPUCODE=true 874 if ($GPUCODE == true) then
and
1663 # -fno-whole-file suppresses argument's data type checking 1664 set OPT='-O0' 1665 if (".$GMS_DEBUG_FLAGS" != .) set OPT="$GMS_DEBUG_FLAGS"
./compall

Edit lked
69 # 70 set GPUCODE=true 71 # 72 # 5. optional MPQC interface
and
958 case openmpi: 959 set MPILIBS="-L$GMS_MPI_PATH/lib" 960 set MPILIBS="$MPILIBS -lmpi -lpthread" 961 breaksw
and
1214 if ($GPUCODE == true) then 1215 echo " Using 'libcchem' add-in C++ codes for Nvidia/CUDA GPUs." 1216 set GPU_LIBS="-L/opt/gamess_cuda/libcchem/lib -lcchem_gamess -lcchem -lrysq" 1217 set GPU_LIBS="$GPU_LIBS -lcudart -lcublas" 1218 ### GPU_LIBS="$GPU_LIBS -lcudart -lcublas" 1219 set GPU_LIBS="$GPU_LIBS /usr/lib/libboost_thread.a" 1220 set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5.a" 1221 set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5_cpp.a" 1222 set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5_hl.a" 1223 set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5.a" 1224 set GPU_LIBS="$GPU_LIBS /opt/acml/acml5.3.1/gfortran64_int64/lib/libacml.a /opt/netlib/CBLAS/lib/cblas_LINUX.a" 1225 set GPU_LIBS="$GPU_LIBS -lz" 1226 set GPU_LIBS="$GPU_LIBS -lstdc++" 1227 ### GPU_LIBS="$GPU_LIBS -lgomp" 1228 set GPU_LIBS="$GPU_LIBS -lpthread" 1229 echo " libcchem GPU code's libraries are" 1230 echo "$GPU_LIBS" 1231 else
./lked gamess gpu.12

Run script
Create rungpu:
#!/bin/csh -v set TARGET=mpi set SCR=$HOME/scratch set USERSCR=/scratch set GMSPATH=/opt/gamess_cuda set JOB=$1 set VERNO=$2 set NCPUS=$3 set PPN=$3 @ NUMGPU=1 if ($NUMGPU > 0) then @ NUMCPU = $NCPUS - 1 echo libcchem kernels will use $NUMCPU cores and $NUMGPU GPUs per node... set echo setenv CCHEM_PROFILE 1 setenv NUM_THREADS $NCPUS setenv GPU_DEVICES 0 #--if ($NUMGPU == 0) setenv GPU_DEVICES -1 #--if ($NUMGPU == 2) setenv GPU_DEVICES 0,1 #--if ($NUMGPU == 4) setenv GPU_DEVICES 0,1,2,3 #setenv LD_LIBRARY_PATH /share/apps/cuda/lib64:$LD_LIBRARY_PATH ###### LD_LIBRARY_PATH /usr/local/cuda/lib64:$LD_LIBRARY_PATH unset echo else echo NO GPU setenv GPU_DEVICES -1 endif if ( $JOB:r.inp == $JOB ) set JOB=$JOB:r echo "Copying input file $JOB.inp to your run's scratch directory..." cp $JOB.inp $SCR/$JOB.F05 setenv TRAJECT $USERSCR/$JOB.trj setenv RESTART $USERSCR/$JOB.rst setenv INPUT $SCR/$JOB.F05 setenv PUNCH $USERSCR/$JOB.dat if ( -e $TRAJECT ) rm $TRAJECT if ( -e $PUNCH ) rm $PUNCH if ( -e $RESTART ) rm $RESTART source $GMSPATH/gms-files.csh setenv LD_LIBRARY_PATH /opt/openmpi/1.6/lib:/opt/netlib/CBLAS/lib:/opt/acml/acml5.3.1/gfortran64_int64/lib set path= ( /opt/openmpi/1.6/bin $path ) /opt/openmpi/1.6/bin/mpiexec -n $NCPUS $GMSPATH/gamess.gpu.$VERNO.x|tee $JOB.out cp $PUNCH .
chmod +x it to make it executable.

Add /opt/gamess_cuda to path:
echo 'export PATH=$PATH:/opt/gamess_cuda'
source ~/.bashrc

Testing
cd /opt/gamess_cuda/tests/standard
gpurun exam44 12 2

409.A.GAMESS US with GPU support on debian wheezy. This works (probably).


Update 27/6/2013:
Please note that Kirill Berezovsky has published a series of posts on GAMESS US, including how to compile it for both CPU and GPU use. See
http://biochemicalmatters.blogspot.com.au/2013/06/gamess-us-frequently-asked-questions_26.html
http://biochemicalmatters.blogspot.ru/2013/06/gamess-us-frequently-asked-questions_1687.html
http://biochemicalmatters.blogspot.ru/2013/06/gamess-us-frequently-asked-questions_1447.html
http://biochemicalmatters.blogspot.com.au/2013/06/gamess-us-frequently-asked-questions.html


Update 21 May 2013: See the comments below this post. This approach most likely works -- what has been confusing me is the lack of reports of GPU timings in the output, but this doesn't necessarily mean that the GPU isn't being used. The poster below this post, using nvidia-smi, observed GPU usage, although the speed-up was not major.


Update 10/05/2013: fixed libcchem compile.

Everything compiles fine and computations run fine and fast. To date there's only one other detailed step-by-step example of successful compilation of GAMESS with GPU support out there. At least based on google.

For various reasons I'm beginning to suspect that ATLAS isn't working out for me -- I've had issues getting things to converge with ATLAS, but which work fine with ACML (see post B).

I was in part following http://combichem.blogspot.com.au/2011/02/compiling-gamess-with-cuda-gpu-support.html and ./libcchem/aaa.readme.1st

This took a while to hammer out, so the write-up is a bit messy.


Set up
sudo apt-get install libboost-all-dev build-essential g++ gfortran automake nvidia-cuda-toolkit python-cheetah openmpi-bin libopenmpi-dev zlib1g-dev checkinstall
mkdir ~/tmp

Get gamess (see e.g. http://verahill.blogspot.com.au/2012/09/compiling-and-testing-gamess-us-on.html).

Put gamess-current.tar.gz in  ~/tmp

cd ~/tmp
tar xvf gamess-current.tar.gz
sudo mv gamess /opt/gamess_cuda
sudo chown $USER:$USER /opt/gamess_cuda -R


Preparing Boost
Edit /usr/include/boost/mpl/aux_/integral_wrapper.hpp
47 // other compilers (e.g. MSVC) are not particulary happy about it 48 #if BOOST_WORKAROUND(__EDG_VERSION__, <= 238) || defined(__CUDACC__) 49 typedef struct AUX_WRAPPER_NAME type;
Edit /usr/include/boost/mpl/size_t_fwd.hpp
20 21 BOOST_MPL_AUX_ADL_BARRIER_NAMESPACE_OPEN 22 #if defined(__CUDACC__) 23 typedef std::size_t std_size_t; 24 template< std_size_t N > struct size_t; 25 #else 26 template< std::size_t N > struct size_t; 27 #endif 28 29 BOOST_MPL_AUX_ADL_BARRIER_NAMESPACE_CLOSE
Edit /usr/include/boost/mpl/size_t.hpp
19 #if defined(__CUDACC__) 20 #define AUX_WRAPPER_VALUE_TYPE std_size_t 21 #define AUX_WRAPPER_NAME size_t 22 #define AUX_WRAPPER_PARAMS(N) std_size_t N 23 #else 24 #define AUX_WRAPPER_VALUE_TYPE std::size_t 25 #define AUX_WRAPPER_NAME size_t 26 #define AUX_WRAPPER_PARAMS(N) std::size_t N 27 #endif 28

HDF5
You'll have to compile that yourself for now since H5Cpp.h missing in the debian packages.(i.e. cxx support)

mkdir ~/tmp
cd ~/tmp
wget http://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.8.10-patch1.tar.gz
tar xvf hdf5-1.8.10-patch1.tar.gz
cd hdf5-1.8.10-patch1/
export CC=/usr/bin/gcc-4.6 && export CXX=/usr/bin/g++-4.6
./configure --prefix=/opt/gamess_cuda/hdf5 --with-pthread --enable-cxx --enable-threadsafe --enable-unsupported
make
mkdir /opt/gamess_cuda/hdf5/lib -p
mkdir /opt/gamess_cuda/hdf5/include -p
sudo checkinstall
This package will be built according to these values: 0 - Maintainer: [ root@neon ] 1 - Summary: [ hdf5-cxx] 2 - Name: [ hdf5-1.8.10 ] 3 - Version: [ 1.8.10-1 ] 4 - Release: [ 1 ] 5 - License: [ GPL ] 6 - Group: [ checkinstall ] 7 - Architecture: [ amd64 ] 8 - Source location: [ hdf5-1.8.10-patch1 ] 9 - Alternate source location: [ ] 10 - Requires: [ ] 11 - Provides: [ hdf5-1.8.10 ] 12 - Conflicts: [ ] 13 - Replaces: [ ]
Make sure to edit the Version field since Patch-1 leads to an error (must start with digit).
Openmpi 1.6 Can't remember why I ended up compiling it myself instead of using the stock debian version. From here.

sudo apt-get install build-essential gfortran
wget http://www.open-mpi.org/software/ompi/v1.6/downloads/openmpi-1.6.tar.bz2
tar xvf openmpi-1.6.tar.bz2
cd openmpi-1.6/

sudo mkdir /opt/openmpi/
sudo chown ${USER} /opt/openmpi/
./configure --prefix=/opt/openmpi/1.6/ --with-sge

make
make install

compiling libcchem
cd /opt/gamess_cuda/libcchem
edit /opt/gamess_cuda/libcchem/rysq/src/externals/boost/cuda/device_ptr.hpp
  4 #include <cstdlib>
  5 #include <iterator>
  6 #include <stddef.h>
  7 
  8 namespace boost {
Edit /opt/gamess_cuda/libcchem/src/externals/boost/cuda/device_ptr.hpp
  4 #include <cstdlib>
  5 #include <iterator>
  6 #include <stddef.h>
  7 
  8 namespace boost {
  9 namespace cuda {
./configure --with-gamess --with-hdf5=/opt/gamess_cuda/hdf5 CPPFLAGS="-I/opt/gamess_cuda/hdf5/include" --with-cuda=/usr --disable-openmp --prefix=/opt/gamess_cuda/libcchem --with-gpu=fermi --with-integer8 --with-cublas
make
make install

Configure Gamess US Mainly follow this: http://verahill.blogspot.com.au/2012/09/compiling-and-testing-gamess-us-on.html
cd /opt/gamess_cuda
./config
please enter your target machine name: linux64 GAMESS directory? [/opt/gamess_cuda] /opt/gamess_cuda Setting up GAMESS compile and link for GMS_TARGET=linux64 GAMESS software is located at GMS_PATH=/opt/gamess_cuda Please provide the name of the build locaation. This may be the same location as the GAMESS directory. GAMESS build directory? [/home/me/tmp/gamess] Please provide a version number for the GAMESS executable. This will be used as the middle part of the binary's name, for example: gamess.00.x Version? [00] 12r2 Please enter your choice of FORTRAN: gfortran gfortran is very robust, so this is a wise choice. Please type 'gfortran -dumpversion' or else 'gfortran -v' to detect the version number of your gfortran. This reply should be a string with at least two decimal points, such as 4.1.2 or 4.6.1, or maybe even 4.4.2-12. The reply may be labeled as a 'gcc' version, but it is really your gfortran version. Please enter only the first decimal place, such as 4.1 or 4.6: 4.6
Enter your choice of 'mkl' or 'atlas' or 'acml' or 'none': atlas Please enter the Atlas subdirectory on your system: /opt/ATLAS/lib Math library 'atlas' will be taken from /opt/ATLAS If you have an expensive but fast network like Infiniband (IB), and if you have an MPI library correctly installed, choose 'mpi'. communication library ('sockets' or 'mpi')? mpi Enter MPI library (impi, mvapich2, mpt, sockets): openmpi
Please enter your openmpi's location: /opt/openmpi/1.6

Build Gamess US
cd /opt/gamess_cuda/ddi/
./compddi
cd ../

Edit comp
872 # see ~/gamess/libcchem/aaa.readme.1st for more information 873 set GPUCODE=true 874 if ($GPUCODE == true) then
and
1663 # -fno-whole-file suppresses argument's data type checking 1664 set OPT='-O0' 1665 if (".$GMS_DEBUG_FLAGS" != .) set OPT="$GMS_DEBUG_FLAGS"
./compall

Edit lked
69 # 70 set GPUCODE=true 71 # 72 # 5. optional MPQC interface
and
958 case openmpi: 959 set MPILIBS="-L$GMS_MPI_PATH/lib" 960 set MPILIBS="$MPILIBS -lmpi -lpthread" 961 breaksw
and
1214 if ($GPUCODE == true) then 1215 echo " Using 'libcchem' add-in C++ codes for Nvidia/CUDA GPUs." 1216 set GPU_LIBS="-L/opt/gamess_cuda/libcchem/lib -lcchem_gamess -lcchem -lrysq" 1217 set GPU_LIBS="$GPU_LIBS -lcudart -lcublas" 1218 ### GPU_LIBS="$GPU_LIBS -lcudart -lcublas" 1219 set GPU_LIBS="$GPU_LIBS /usr/lib/libboost_thread.a" 1220 set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5.a" 1221 set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5_cpp.a" 1222 set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5_hl.a" 1223 set GPU_LIBS="$GPU_LIBS /opt/gamess_cuda/hdf5/lib/libhdf5.a" 1224 set GPU_LIBS="$GPU_LIBS /opt/ATLAS/lib/libcblas.a" 1225 set GPU_LIBS="$GPU_LIBS -lz" 1226 set GPU_LIBS="$GPU_LIBS -lstdc++" 1227 ### GPU_LIBS="$GPU_LIBS -lgomp" 1228 set GPU_LIBS="$GPU_LIBS -lpthread" 1229 echo " libcchem GPU code's libraries are" 1230 echo "$GPU_LIBS" 1231 else

./lked gamess gpu.12

Create gpurun
#!/bin/csh set TARGET=mpi set SCR=$HOME/scratch set USERSCR=/scratch set GMSPATH=/opt/gamess_cuda set JOB=$1 set VERNO=$2 set NCPUS=$3 @ NUMGPU=1 if ($NUMGPU > 0) then @ NUMCPU = $NCPUS - 1 echo libcchem kernels will use $NUMCPU cores and $NUMGPU GPUs per node... set echo setenv CCHEM_PROFILE 1 setenv NUM_THREADS $NCPUS #--if ($NUMGPU == 0) setenv GPU_DEVICES -1 #--if ($NUMGPU == 2) setenv GPU_DEVICES 0,1 #--if ($NUMGPU == 4) setenv GPU_DEVICES 0,1,2,3 #setenv LD_LIBRARY_PATH /share/apps/cuda/lib64:$LD_LIBRARY_PATH ###### LD_LIBRARY_PATH /usr/local/cuda/lib64:$LD_LIBRARY_PATH unset echo else setenv GPU_DEVICES -1 endif if ( $JOB:r.inp == $JOB ) set JOB=$JOB:r echo "Copying input file $JOB.inp to your run's scratch directory..." cp $JOB.inp $SCR/$JOB.F05 setenv TRAJECT $USERSCR/$JOB.trj setenv RESTART $USERSCR/$JOB.rst setenv INPUT $SCR/$JOB.F05 setenv PUNCH $USERSCR/$JOB.dat if ( -e $TRAJECT ) rm $TRAJECT if ( -e $PUNCH ) rm $PUNCH if ( -e $RESTART ) rm $RESTART source $GMSPATH/gms-files.csh setenv LD_LIBRARY_PATH /opt/openmpi/lib:$LD_LIBRARY_PATH set path= ( /opt/openmpi/bin $path ) mpiexec -n $NCPUS $GMSPATH/gamess.gpu.$VERNO.x|tee $JOB.out cp $PUNCH .

echo 'export PATH=$PATH:/opt/gamess_cuda' >> ~/.bashrc
source ~/.bashrc
chmod +x gpurun
cd test/standard/
 gpurun exam44 12 2


The only evidence of GPU usage in the output is e.g. in exam44.out:
388           -----------------------
389           MP2 CONTROL INFORMATION
390           -----------------------
391           NACORE =        6  NBCORE =        6
392           LMOMP2 =        F  AOINTS = DUP
393           METHOD =        2  NWORD  =               0
394           MP2PRP =        F  OSPT   = NONE
395           CUTOFF = 1.00E-09  CPHFBS = BASISAO
396           CODE   = GPU
397 
398           NUMBER OF CORE -A-  ORBITALS =     6
399           NUMBER OF CORE -B-  ORBITALS =     6

but in the summary only CPU utilisation is mentioned.



I modified rungms:

me@neon:/opt/gamess_cuda/tests/standard$ diff /opt/gamess_cuda/gpurungms /opt/gamess/rungms 
59,62c59,62
< set TARGET=mpi
< set SCR=$HOME/scratch
< set USERSCR=/scratch
< set GMSPATH=/opt/gamess_cuda
---
> set TARGET=sockets
> set SCR=/scr/$USER
> set USERSCR=~$USER/scr
> set GMSPATH=/u1/mike/gamess
67d66
< set NNODES=1
513c512
< set PPN=$3
---
>    set PPN=$4
601c600
<          @ PPN2 = $PPN
---
>          @ PPN2 = $PPN + $PPN
742c741
<    @ NUMGPU=1
---
>    @ NUMGPU=0
752c751
< #      setenv LD_LIBRARY_PATH /share/apps/cuda/lib64:$LD_LIBRARY_PATH
---
>       setenv LD_LIBRARY_PATH /share/apps/cuda/lib64:$LD_LIBRARY_PATH
793c792,793
<       /opt/openmpi/1.6/bin/mpiexec -n $NPROCS $GMSPATH/gamess.$VERNO.x < /dev/null
---
>       mpiexec.hydra -f $PROCFILE -n $NPROCS \
>             /home/mike/gamess/gamess.$VERNO.x < /dev/null

26 April 2013

396. Compiling gromacs 4.6 with gpu support, openblas and fftw3 on debian wheezy

NOTE: with ACML my performance on my FX8150 and FX8350 nodes is only 25% of that with Openblas (double precision). Yes, for some reason gromacs is four times faster with openblas than with the machine vendor libraries in my tests.

Here are the release notes: http://www.gromacs.org/About_Gromacs/Release_Notes/Versions_4.6.x
As far as I understand you don't have to rely on openmm anymore for CUDA. Yes, the PITA of compiling openmm is gone!

Note that GPU calcs only speed things up under certain, specific conditions  -- and not all nvidia cards are supported (or equal). My own set-up, using statically cooled graphics cards, is definitely not appropriate for a GPU cluster. Once nwchem comes out with GPU support I might upgrade to fancier $200 graphics cards (maybe COSMO in NWChem will finally become more reasonable in terms of computational cost), but there's little reason for that at the moment.

Not all cards are created equal either -- e.g. GT210, which has GPU compute capability 1.2, is too poor to run with gromacs. GT430 (compute cap GT430) works. Both are obviously not viable for professional work.

Also note that it seems that you still need to use OPENMM if you want GPU support for implicit solvation.

Gromacs used to be easy to install. It's become a fair bit more complicated between 4.5.5 and 4.6. See here for gromacs 4.5.5: http://verahill.blogspot.com.au/2012/05/gromacs-with-external-fftw3-and-blas-on.html

CUDA: If you want to build with cuda you need gcc-4.6, which is still available in the wheezy repos. 4.7 won't work. Luckily, you can have both on your system, but you'll need to specify CC and CXX as shown below.

Openblas
Note that the links to the openblas file tends to die after a while, so you might have to download it manually.

sudo mkdir /opt/openblas
sudo chown ${USER} /opt/openblas
cd ~/tmp
wget http://github.com/xianyi/OpenBLAS/tarball/v0.2.6
tar xvf v0.2.6
cd xianyi-OpenBLAS-87b4d0c/
wget http://www.netlib.org/lapack/lapack-3.4.1.tgz
make all BINARY=64 CC=/usr/bin/gcc FC=/usr/bin/gfortran USE_THREAD=0 INTERFACE64=1 1> make.log 2>make.err
make PREFIX=/opt/openblas install
cp lib*.*  /opt/openblas/lib

add
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/openblas/lib
to your ~/.bashrc [for later use with nwchem and ecce, add /opt/openblas/lib to /etc/ld.so.conf and do sudo ldconfig -- you might want to make libopenblas.so and libopenblas.so.0 sym links to the main lib, libopenblas_bulldozer-r0.2.6.so]

single-precision gromacs 4.6 with both CPU and GPU

CUDA
If you have an nvidia card and want to enable GPU calcs, do
sudo apt-get install nvidia-cuda-toolkit gcc-4.6 g++-4.6

If /usr/lib/libcuda.so is nothing by a symmlink to /usr/lib/libcuda.so.1, and the file /usr/lib/libcuda.so.1 is missing (this was the case on my wheezy amd64), then do
sudo rm /usr/lib/libcuda.so
sudo ln -s /usr/lib/x86_64-linux-gnu/libcuda.so.1 /usr/lib/libcuda.so

You can also simply make sure that there/s no /usr/lib/libcuda.so.

Continue with the gromacs compilation:
cd ~/tmp
sudo apt-get install cmake
wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-4.6.tar.gz
tar xvf gromacs-4.6.tar.gz
mkdir build_gromacs46
cd build_gromacs46
sudo mkdir /opt/gromacs
sudo chown ${USER} /opt/gromacs
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/openblas/lib
export LDFLAGS="-L/opt/openblas/lib -lopenblas"
export CPPFLAGS="-I/opt/openblas/include"
export CC=/usr/bin/gcc-4.6 && export CXX=/usr/bin/g++-4.6 && cmake -DGMX_FFT_LIBRARY=fftw3 -DGMX_BUILD_OWN_FFTW=On -DGMX_DOUBLE=off -DCMAKE_INSTALL_PREFIX=/opt/gromacs/gromacs4.6_single -DGMX_EXTERNAL_BLAS=/opt/openblas/lib ../gromacs-4.6
make
make install

Note: for acml I used this instead:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/acml/acml5.2.0/gfortran64_fma4_int64/lib
export LDFLAGS="-L/opt/acml/acml5.2.0/gfortran64_fma4_int64/lib -lacml"
export CPPFLAGS="-I/opt/acml/acml5.2.0/gfortran64_fma4_int64/include"
export CC=/usr/bin/gcc-4.6 && export CXX=/usr/bin/g++-4.6 && cmake -DGMX_FFT_LIBRARY=fftw3 -DGMX_BUILD_OWN_FFTW=On -DGMX_DOUBLE=off -DCMAKE_INSTALL_PREFIX=/opt/gromacs/gromacs4.6_single -DGMX_EXTERNAL_BLAS=/opt/acml/acml5.2.0/gfortran64_fma4_int64/lib ../gromacs-4.6


Double-precision gromacs without GPU acceleration:

cd ~/tmp/build_gromacs46
rm * -rf
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/openblas/lib
export LDFLAGS="-L/opt/openblas/lib -lopenblas"
export CPPFLAGS="-I/opt/openblas/include"
export CC=/usr/bin/gcc-4.6 && export CXX=/usr/bin/g++-4.6 && cmake -DGMX_FFT_LIBRARY=fftw3 -DGMX_BUILD_OWN_FFTW=On -DGMX_DOUBLE=on -DGMX_GPU=off -DCMAKE_INSTALL_PREFIX=/opt/gromacs/gromacs4.6_double -DGMX_EXTERNAL_BLAS=/opt/openblas/lib ../gromacs-4.6
make
make install

Note: for acml I used this instead:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/openmpi/lib:/opt/acml/acml5.2.0/gfortran64_fma4_int64/lib
export LDFLAGS="-L/opt/acml/acml5.2.0/gfortran64_fma4_int64/lib -lacml"
export CPPFLAGS="-I/opt/acml/acml5.2.0/gfortran64_fma4_int64/include"
export CC=/usr/bin/gcc-4.6 && export CXX=/usr/bin/g++-4.6 && cmake -DGMX_FFT_LIBRARY=fftw3 -DGMX_BUILD_OWN_FFTW=On -DGMX_DOUBLE=on -DGMX_GPU=off -DCMAKE_INSTALL_PREFIX=/opt/gromacs/gromacs4.6_double -DGMX_EXTERNAL_BLAS=/opt/acml/acml5.2.0/gfortran64_fma4_int64/lib ../gromacs-4.6

Add gromacs to path:
echo 'export PATH=$PATH:/opt/gromacs/gromacs4.6_single/bin:/opt/gromacs/gromacs4.6_double/bin' >> ~/.bashrc

Switching between GPU and CPU
You can use the same binary for both, but remember that only the single precision binaries have GPU support to begin with. To set gpu vs cpu, use the -nb option in mdrun:
-nb enum auto Calculate non-bonded interactions on: auto, cpu, gpu or gpu_cpu


Quick test:
cd ~/tmp
wget http://www.gromacs.org/@api/deki/files/128/=gromacs-gpubench-dhfr.tar.gz
tar xvf \=gromacs-gpubench-dhfr.tar.gz
cd dhfr/GPU/dhfr-solv-PME.bench
mdrun -nb cpu -s topol.tpr -testverlet

Hit ctrl+c to stop and get statistics. Then try
mdrun -nb gpu -s topol.tpr -testverlet

I got
XPU ns/day -------------- auto 7.4 GPU 7.7 CPU 4.1 gpu_cpu 7.5

where I have a 3 core 3.1 GHz AMD Athlon II X3 445 CPU and an NVIDIA GeForce GT 430 graphics card -- neither of which is anything special.

Note also that the ns/day values depended highly on how long I let the calc run, and as I didn't time it and make them run the same amount of time, I suspect that auto, GPU and gpu_cpu are all about the same.