Showing posts with label PNNL. Show all posts
Showing posts with label PNNL. Show all posts

23 October 2012

264. Upgrade to ECCE 6.4 on Debian Testing

There's little reason to upgrade from 6.3 to 6.4 since
Other than open source availability and bundling the latest NWChem 6.1.1, the ECCE 6.4 release is otherwise equivalent to the 6.3 release.  
But it's always nice with some new and shiny.

For general instructions on how to install ecce from scratch, see e.g. http://verahill.blogspot.com.au/2012/06/ecce-in-virtual-machine-step-by-step.html

Upgrading
Go here to download the latest release. You will be asked to supply your name, email address and the name of your institution. However, you no longer need to register.

Download the file install_ecce.v6.4.rhel5-gcc4.1.2-m64.csh (full binary + builder). And yes, you can use wget for that.
Stop the ecce server if it's running:
 ~/.ecce/ecce-6.3e/server/ecce-admin/stop_ecce_server 

Make your ecce install file executable and run it:
chmod +x install_ecce.v6.4.rhel5-gcc4.1.2-m64.csh
./install_ecce.v6.4.rhel5-gcc4.1.2-m64.csh

which launches the installation:
Extracting ECCE distribution from ./install_ecce.v6.4.rhel5-gcc4.1.2-m64.csh...

Main ECCE installation menu
===========================
1) Help on main menu options
2) Prerequisite software check
3) Full install
4) Full upgrade
5) Application software install
6) Application software upgrade
7) Server install
8) Server upgrade

IMPORTANT: If you are uncertain about any aspect of installing
or running ECCE at your site, please refer to the detailed
ECCE Installation and Administration Guide at 
http://ecce.pnl.gov/docs/installation/2864B-Installation.pdf

Hit  at prompts to accept the default value in brackets.

Selection: [1] 4
Host name: [beryllium] 
New application installation directory: [/home/me/tmp/ecce-v6.4/apps] /home/me/.ecce/ecce-v6.4/apps
Existing application directory to upgrade: /home/me/.ecce/ecce-v6.3e/apps

Backup existing server user data (yes/no)? [yes]

ECCE v6.4 will be installed using the settings:

  Installation type: [full upgrade]
  Host name: [beryllium]
  Application installation directory: [/home/me/.ecce/ecce-v6.4/apps]
  Application directory to upgrade: [/home/me/.ecce/ecce-6.3e/apps]
  Server installation directory: [/home/me/.ecce/ecce-v6.4/server]
  Server directory to upgrade: [/home/me/.ecce/ecce-6.3e/server]
  Backup existing server user data: [yes]

Are these choices correct (yes/no/quit)? [yes] 
Installing ECCE application software in /home/me/.ecce/ecce-v6.4/apps...
  Extracting application distribution...
  Extracting NWChem binary distribution...
  Extracting NWChem common distribution...
  Extracting client WebHelp distribution...
  Configuring application software...
  Configuring NWChem...

Installing ECCE server in /home/me/.ecce/ecce-v6.4/server...
  Extracting data server in /home/me/.ecce/ecce-v6.4/server/httpd...
  Extracting data libraries in /home/me/.ecce/ecce-v6.4/server/data...
  Extracting Java Messaging Server in /home/me/.ecce/ecce-v6.4/server/activemq...
  Configuring ECCE server...
  Copying user data from server to be upgraded...
  Copying share data from server to be upgraded...

ECCE installation succeeded.

***************************************************************
!! You MUST perform the following steps in order to use ECCE !!
-- Unless only the user 'me' will be running ECCE,
   start the ECCE server as 'me' with:
     /home/me/.ecce/ecce-v6.4/server/ecce-admin/start_ecce_server

-- To register machines to run computational codes, please see
   the installation and compute resource registration manuals
   at http://ecce.pnl.gov/using/installguide.shtml

-- Before running ECCE each user must source an environment
   setup script.  For csh/tcsh users add this to ~/.cshrc:
     if ( -e /home/me/.ecce/ecce-v6.4/apps/scripts/runtime_setup ) then
       source /home/me/.ecce/ecce-v6.4/apps/scripts/runtime_setup
     endif
   For sh/bash users, add this to ~/.profile or ~/.bashrc:
     if [ -e /home/me/.ecce/ecce-v6.4/apps/scripts/runtime_setup.sh ]; then
       . /home/me/.ecce/ecce-v6.4/apps/scripts/runtime_setup.sh
     fi
***************************************************************

And then
/home/me/.ecce/ecce-v6.4/server/ecce-admin/start_ecce_server
/home/me/.ecce/ecce-v6.4/server/httpd/bin/apachectl start: httpd started
[1] 25382
INFO  BrokerService         
- ActiveMQ 5.1.0 JMS Message Broker (localhost) is starting
INFO  BrokerService      
- ActiveMQ JMS Message Broker (localhost, ID:beryllium-46481-1350964505499-0:0) started

Put the following in your ~/.bashrc

export ECCE_HOME=/home/me/.ecce/ecce-v6.4/apps
export PATH=${ECCE_HOME}/scripts:${ECCE_HOME}/scripts/parsers:${PATH}$

And run:
source ~/.bashrc
ecce

07 June 2012

179. Building ECCE on Debian Testing/Wheezy

UPDATE: Build went fine. Upgrade went fine. But the organizer doesn't show my jobs properly i.e. the files are there but they aren't recognised as jobs. I haven't had a solid look at this yet, and it might just be because I need to restart more services than just the http server. It's been a long day...

UPDATE 2: An update on a different computer went without a hitch, with all the old job files being imported properly.

UPDATE 3: I started ECCE and let the data manger chew on it for four hours. No luck on the troublesome computer. The only difference is the java -- openjdk 7 worked fine, oracle jre/jdk didn't. Dunno if this is the reason, but currently in the process of installing the binaries to see whether that works better. Updates will come...

UPDATE 4: Installing the prebuilt ECCE binaries did the trick. In summary: as far as I know you MUST use openjdk 7. SUN/Oracle Java does not appear to work. It's exhibited in a lack of ability to recognise old jobs as being...jobs rather than just folders and files.


POST BEGINS HERE:
I'm trying to document everything I'm doing these days, no matter how simple or (at least in retrospect) obvious it is.

Here's how to build the 4th of June 2012 version of ecce v6.3.

You need to be registered with EMSL/PNNL to download ecce. There are plans to open-source the software properly (i.e. no need to register) sometime this coming northern summer. But for now you need to be an academic group leader to have access.

(I originally posted a somewhat different post where I recommended making some changes to the build scripts re ECCE_HOME. Eventually I saw the light and realised the error of my ways)

Download ecce-v6.3.src.tar.bz2, and put it in a suitable folder, e.g. ~/tmp
cd ~/tmp
tar -xvf ecce-v6.3-src.tar.bz2
cd ecce-v6.3/
export ECCE_HOME=`pwd`
cd build/
./build_ecce

The first time you run ./build_ecce you'll be asked a series of questions relating to installed packages. If it's all good, answer
Do you want to skip these checks for future build_ecce invocations (y/n)? y
If anything came up, then read the message carefully and install the missing package.

NOTE: on one box I noticed different version of java and javac being found, as I had both openjdk 6 and 7 installed. I couldn't set javac to 6 but I could do
sudo update-alternatives --config java
and set it to openjdk 7.

[From my small, statistically unsound sample set Oracle/SUN java will NOT work.]

Then do ./build_ecce again. And again. And again. In all, I think you do it six or seven times - each time a new package is built.
I always get a
lib: No such file or directory.
at the end of the httpd build. Not sure why, but everything seems to be ok in spite of that.
Anyway, you know that you're done running ./build_ecce when you get
ECCE built and distribution created in /home/verahill/tmp/ecce-v6.3
At this point, you are ready to install
DO NOT USE ./install_ecce

GO UP ONE LEVEL AND DO
./install_ecce.v6.3.csh

But that's a different story. install_ecce will give you weird error messages about missing tar files. install_ecce.v6.3.csh on the other hand will work fine.

01 June 2012

169. ECCE, node hopping and cluster nodes without direct access to WAN

NOTE: I'm still working on this. But this sort of works for now. More details coming soon.
Update:  I've posted better solutions more recently for use with SGE. See aspects of e.g. http://verahill.blogspot.com.au/2012/06/ecce-in-virtual-machine-step-by-step.html and http://verahill.blogspot.com.au/2012/06/ecce-and-inaccessible-cluster-nodes.html

It's not easy choosing a good title for this post which describes the purpose of it succinctly yet clearly (sort of like this sentence), so here's what we're dealing with:

The Problem:
You can access the submit node from off-site. You can't access the subnodes directly from off-site. This post shows you how you can submit to each subnode directly. A better technical solution is obviously to use qsub on the main node. Having said that, with very little modification this method can also be adapted to the situation described here: http://verahill.blogspot.com.au/2012/05/port-redirection-with-eccenwchem.html

A low-tech, home-built example is the following:
-eth0 - Main  node - eth1 - (subnodes: node0, node1, node2, node3)/

eth0 is WAN, and eth1 is LAN. You can ssh to the main node from the 'internet'. There's no queue manager like SGE installed -- submission of jobs is done by logging onto each node and executing the commands there.

The example:
Main node:
    my.cluster.edu
Nodes:
   compute-0-0
   compute-0-1
   compute-0-2
   compute-0-3


The most obvious solution:
do port redirection. The downside is that it requires some technical skills of the users, and anything with networking and ssh is a PITA if they insist on using windows.


The smarter solution:
I was informed of this solution here.

ROCKS-specific stuff
If your cluster is running ROCKS 5.4.3 and you're having issues opening csh shells, just move /etc/csh.cshrc out of the way as a crude fix.
sudo mv /etc/csh.cshrc ~/
Don't forget to do this on all the nodes if they have local /etc folders! And it's not that easy -- the passwords aren't the same on the nodes as on the main node.
So, on the main node:
rocks set host sec_attr compute-0-1 attr=root_pw
sudo su
# cat /etc/hosts
#ssh 10.1.255.253
#mv /etc/csh.cshrc .
#exit

As the user you'll be running as, edit/create your ~/.cshrc

setenv GAUSS_SCRDIR /tmp
setenv ECCE_HOME /share/apps/ecce/apps
set PATH= (/share/apps/nwchem/nwchem-6.1/bin/LINUX64 $PATH)
setenv LD_LIBRARY_PATH /opt/openmpi/lib:/share/apps/openblas/lib
Repeat on all nodes.

SIDE NOTE: What I'm ultimately looking to achieve on the ROCKS cluster is front-node managed SGE submissions. Easy, you say? Well, ECCE submits a
setenv PATH /opt/gridengine/bin/lx26-amd64/:/usr/bin/:/bin:/usr/sbin:/sbin:/usr/X11R6/bin:/usr/bin/X11:${PATH}
and for some reason the ROCKS/CentOS (t)csh can't handle (most of the time) adding ${PATH} to itself since it's too long unless "s are added (it seems) i.e. this works consistently:
setenv PATH "/opt/gridengine/bin/lx26-amd64/:/usr/bin/:/bin:/usr/sbin:/sbin:/usr/X11R6/bin:/usr/bin/X11:${PATH}"
On my debian system either works fine (bsd-csh).

ANYWAY -- continue

Ecce:
Start setting it up using
ecce -admin



After that, do some editing in ecce-6.3/apps/siteconfig/

CONFIG.voldemort:
xappsPath: /usr/X11R6/bin
NWChem: /share/apps/nwchem/nwchem-6.1/bin/LINUX64/nwchem
Gaussian-03: /share/apps/gaussian/g09/g09
sourceFile:  /etc/csh.login
frontendMachine: my.cluster.edu

Machines:
compute-0-0 compute-0-0.local Dell HPCS3 Phase 1 Cluster AMD 2.6 GHz Opteron 40:5 ssh :NWChem:Gaussian-03MN:RD:SD:UN:PW




05 May 2012

139. compiling nwchem with custom ATLAS on debian

I'm not yet sure, but there seems to be something weird with this build. Use the openblas build here preferentially:  http://verahill.blogspot.com.au/2012/05/nwchem-with-openblas.html

i.e. do not use this build without testing it against builds without external blas/atlas and against other libraries.

I am not an expert

The atlas build is pretty much like what was described here:
http://verahill.blogspot.com.au/2012/05/compile-atlas-blas-on-debian-testing.html


Build atlas
Check you cpu governor:

cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor 
ondemand
Next, set it to performance for all cores (example for three-core AMD)
sudo cpufreq-set -g performance
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor

Prepare the install directory:
sudo mkdir /opt/ATLAS
sudo chown ${USER} /opt/ATLAS

Prepare the compile directory
mkdir ~/tmp
cd ~/tmp
 wget http://downloads.sourceforge.net/project/math-atlas/Developer%20%28unstable%29/3.9.72/atlas3.9.72.tar.bz2
tar xvf atlas3.9.72.tar.bz2
cd ATLAS/
mkdir build/
vim Make.top 
change -V on line 6 to -v

cd build
.././configure --prefix=/opt/ATLAS

There's any level of detail in what you can pass to configure. -A AMD64 would be one option. Look at the bottom for more info.

make
make install
cp lib/lib* /opt/ATLAS/lib

Once it's all done, set the cpu governor back if you so desire.
sudo cpufreq-set -g ondemand
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
sudo cp /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor


nwchem
For python support you need to edit src/config/makefile.h (see http://verahill.blogspot.com.au/2012/04/adding-python-support-to-nwchem-under.html)

export LARGE_FILES=TRUE
export TCGRSH=/usr/bin/ssh
export NWCHEM_TOP=/opt/nwchem/nwchem-6.0
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES="all python"
export PYTHONHOME=/usr
export PYTHONVERSION=2.7
export BLASOPT="-L/opt/ATLAS/lib -latlas -lblas -llapack"
export USE_MPI=y
export USE_MPIF=y
export MPI_LOC=/usr/lib/openmpi/lib
export MPI_INCLUDE=/usr/lib/openmpi/include
export LIBRARY_PATH=$LIBRARY_PATH:/usr/lib/openmpi/lib
export LIBMPI="-lmpi -lopen-rte -lopen-pal -ldl -lmpi_f77 -lpthread"
cd $NWCHEM_TOP/src
make clean
make nwchem_config
make FC=gfortran

Note: don't use HAS_BLAS=y. It gave me no end of grief.

Also, IF you binary executes without a hitch at this point it means that you're libs are not being loaded.  You need to put
export LD_LIBRARY_PATH=/opt/ATLAS/lib:$LD_LIBRARY_PATH
in your ~/.bashrc

Building takes a while, but should go ok. Make sure you have python2.7-dev installed, and if you're using a different python version, make sure to change the options above.




Here's also an example ~/.nwchemrc file for no particular reason at all -- if you've got ecce installed you might want to make sure yours is correct.
nwchem_basis_library /opt/nwchem/nwchem-6.0/src/basis/libraries/
nwchem_nwpw_library /opt/nwchem/nwchem-6.0/src/nwps/libraryps/
ffield amber
amber_1 /opt/nwchem/nwchem-6.0/src/data/amber_s/
amber_2 /opt/nwchem/nwchem-6.0/src/data/amber_x/
spce /opt/nwchem/nwchem-6.0/src/data/solvents/spce.rst
charmm_s /opt/nwchem/nwchem-6.0/src/data/charmm_s/
charmm_x /opt/nwchem/nwchem-6.0/src/data/charmm_x/




You can get a list over compile options for ATLAS using
make xprint_enums ; ./xprint_enums

after you've done .././configure

See http://sourceforge.net/tracker/index.php?func=detail&aid=3021404&group_id=23725&atid=379483


19 March 2012

111. Ecce (nwchem) on Debian, and ROCKS/Centos

If you're using nwchem chances are that you've considered using ECCE to parse the output:
http://ecce.emsl.pnl.gov/

First of all you'll need to register at https://eus.emsl.pnl.gov/Portal/ -- and you can only do that if you're faculty. Postdocs and PhD students need not apply. Other than that, it's free, but you'll have to wait a couple of days to get your registration approved.

As much as I like nwchem owing to the clear syntax, I feel less warmly about ecce. Don't get me wrong -- it's pretty. It's just feels archaic and cobbled together. Even worse is that it's not open source and that its workings feel a bit opaque at times. Still, there's no better program for visually parsing nwchem output at this point. Anyway...

--start here --
Debian:
Download the install_ecce.v6.2.rhel5-gcc3.2.3-m32.csh file to ~/tmp/ecce

There's no md5sum supplied but here's what I got:
2ee70cc817dee9f80b11be5eac6e53e5

If you haven't already
sudo apt-get install csh 

OK, moving on...
cd ~/tmp/ecce
chmod +x  install_ecce.v6.2.rhel5-gcc3.2.3-m32.csh
./install_ecce.v6.2.rhel5-gcc3.2.3-m32.csh


Main ECCE installation menu
===========================
0) Help on main menu options
1) Full install
2) Full upgrade
3) Application software install
4) Application software upgrade
5) Server install
6) Server upgrade

Pick 1 if you're installing on your desktop and there's no server that you know of. 

Once the installation is over you get:
***************************************************************
!! You MUST perform the following steps in order to use ECCE !!
-- Unless only the user 'me' will be running ECCE,
   start the ECCE server as 'me' with:
     /home/me/tmp/ecce/ecce-v6.2/server/ecce-utils/start_ecce_server
-- To register machines to run computational codes, please see
   the installation and compute resource registration manuals
   at http://ecce.pnl.gov/using/installguide.shtml
-- To run ECCE each user must source either the runtime_setup
   (csh/tcsh) or runtime_setup.sh (sh/bash/ksh) script in the
   directory /home/me/tmp/ecce/ecce-v6.2/apps/scripts
   from their shell environment setup script.  For example,
   with csh or tcsh, add the following to ~/.cshrc:
     if (-e /home/me/tmp/ecce/ecce-v6.2/apps/scripts/runtime_setup) then
       source /home/me/tmp/ecce/ecce-v6.2/apps/scripts/runtime_setup
     endif
***************************************************************
Which translates to:
1. sh  /home/me/tmp/ecce/ecce-v6.2/server/ecce-utils/start_ecce_server
2. Sourcing that file makes no sense. Instead, add the following to your ~/.bashrc
export ECCE_HOME=/home/me/tmp/ecce/ecce-v6.2/apps
export PATH=${ECCE_HOME}/scripts:${PATH}

Assuming you've source your ~/.bashrc, start ecce by typing
ecce

...which takes an unreasonably long time (ca 1 min) after which you're greeted by
Press Any Key
Type in a password -- any password -- which will be your password from now on.
You're then taken to
Click on Viewer (assuming you've got something to look at)
Pay attention to the fine print
Have a look at the text box in the bottom right corner..and pay attention. In my particular case I have 6 cores and an mpi aware nwchem 6.0 version compiled. I bet that's better than whatever comes bundled with ecce. Also, the

To change you go to the machine browser (see screen shot #2), click on set up remote access and make sure that everything is working by clicking on e.g. processes:

Then click on the Machine menu (top left), select Register Machine while your machine is selected.
You can now change your options.

Running:
So, before using ecce you always need to
sh  /home/me/tmp/ecce/ecce-v6.2/server/ecce-utils/start_ecce_server
first. The server will run until you stop it or reboot.
Next, start ecce
ecce

Integration with nwchem
Most people would probably set up their nwchem jobs by hand, because it's so simple. All you need to do is to include the statement
ecce_print ecce.out
in the beginning, and you'll get an ecce.out file which you can then IMPORT (not open regularly, but import) into ecce.

Click on Viewer, Import Calculation From Output File, select your ecce out and voilá:
ECCE: homo (benzene)
If you're running debian, you're done now.



ROCKS 5.4.3/Centos 5.6:
This isn't a fix as much as a rant. The problem with ROCKS 5.4.3 is that csh is so broken that it's a struggle just to install ecce. I mean, I do show how to get ecce running in the end, but ROCKS feels like an unfinished piece of work compared to a normal debian install.

--Demonstration only -- don't do --
First back up ssh-key.sh and ssh-key.csh in /etc/profile.d

So...you start by
chmod +x install_ecce.v6.2.rhel5-gcc3.2.3-m32.csh
./install_ecce.v6.2.rhel5-gcc3.2.3-m32.csh
...and nothing's happening.

You then try just typing in
csh

/etc/profile.d/ssh-key.sh: line 211: return: can only `return' from a function or sourced script
It appears that you have not set up your ssh key.
This process will make the files:
     /export/home/me/.ssh/id_rsa.pub
     /export/home/me/.ssh/id_rsa
     /export/home/me/.ssh/authorized_keys
Generating public/private rsa key pair.
/export/home/me/.ssh/id_rsa already exists.
Overwrite (y/n)? 

Turns out there's a bug in ROCKS 5.4.3.  You can fix that by:
rpm -Uvh ftp://www.rocksclusters.org/pub/rocks/updates/5.4.3/x86_64/RPMS/rocks-config-server-5.4.3-1.x86_64.rpm

So far so good.
csh
...and nothing. It just exits. Or so you think. But the problem is bigger than that --  try opening a new terminal in e.g. gnome (gnome-terminal or xterm) -- it exits immediately. No error message or anything.

You can get csh to start by moving /etc/csh.cshrc out of the way, but you're still screwed as to opening a new terminal. The only way to get back a working system is to restore ssh-key.sh and ssh-key.csh.

--- Demonstration over ---

--Start here --
 You could also get around all this by running
csh -f
But then you don't have any env. variables loading and it can lead to problems of its own.

Anyway:
csh -f install_ecce.v6.2.rhel5-gcc3.2.3-m32.csh

The install starts. Just follow the instructions.

After installation, start the server:
csh -f ecce-v6.2/server/ecce-utils/start_ecce_server

Hit enter until you get a workable prompt back...
Edit your ~/.bashrc and add

export ECCE_HOME=/home/me/tmp/ecce/ecce-v6.2/apps
export PATH=${ECCE_HOME}/scripts:${PATH}

Don't bother sourcing your ~/.bashrc. It's easier to just open a new terminal.
Type
ecce
and you should be up and running...sort of. Under ROCKS I had problems importing ecce.out files since I had problems actually connecting to the server. Don't know why, but it came down to not being able to open a remote shell on the host.

NOTE:
this worked fine on one box, but not on another one which I was setting up remotely. On that one I had to edit

ecce/apps/siteconfig/Dataservers
and
ecce/apps/siteconfig/jndi.properties 

In particular, I had to change references to eccetera.emsl.pnl.gov.