Getting Started with the CEES Grid

The CEES Grid is accessed via the cluster head nodes. You will use your SUNetID and your SUNetID password to log in. The following information is also available in pdf format.

Logging in to CEES

  • You must use SSH or SFTP ( on Windows, SecureCRT and SecureFX).
  • You must log in from a Stanford host. If you wish to login from outside Stanford, you must first run the VPN software.

The Clusters in the CEES Grid

The CEES Grid is composed of two resource clusters, each with a separate head nodes, separate compute nodes, and separate Tool nodes. They are located in different buildings, and do not share file systems. The Clusters can only be used by vested users; the Tool servers can be used by anyone in SES.

Documentation
Linux

  • CEES uses a Red Hat 'clone' called CentOS
  • Red Hat docs are at http://redhat.com/docs/
  • CentOS docs are at http://centos.org/
  • Look in the install directory (usually /usr or /usr/local)
  • Try the man command. Not all software has man pages

Basic Commands and Examples
Examples and documents about PBS/Torque and the MPI packages can be found on the web. All jobs are submitted to PBS/Torque via a job script which contains various definitions (such as a parallel environment), options, and the location of your binary executable program.

Some basic commands (see man pages for more options):
showq - show all running and queued jobs
checkjob <job#> - display information about a specific job. You can display only your own jobs.
pbsnodes - display details about the cluster nodes.
qsub - submits a job script to the batch system. After submission, the batch system will execute the job on the cluster and return the results to you. Examples of job scripts are below.
qstat - checks status of jobs. The command 'qstat -Qf' will give you a list of the queues and the configuration of each (including who has access).
qdel - deletes a job. Takes the job number as the argument.

EXAMPLE: a MPI version of 'hello world'

EXAMPLE: a MPI version of 'hello world'
Example scripts and sample runs can be found in /data/cees/dennis. The following is in the directory 'HelloWorld'. Cluster jobs will not run if they are submitted from /home. PBS creates two files for stdout and stderr in the job submission directory. The files cannot be created on the compute nodes in /home since the directory is mounted read-only, and the cluster job will immediately fail. Cluster jobs must be submitted from a directory in /data. You can use /home for output on the Tool nodes, but using /data is preferable (no quota, and faster).
Create a directory in /data/cees:
% cd /data/cees
% mkdir <your SUNetID>
% cd /data/cees/<your SUNetID>
The name you choose for the /data/cees directory can be any unused name.
Using your favorite editor, create a program file using the following code, and name the file 'hellompi.c'.
#include <stdio.h>
#include <mpi.h>
int main(int argc, char * argv[])
{
int myid, numprocs;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank (MPI_COMM_WORLD,&myid);
printf("Hello World! I am #%d of %d procs\n",myid,numprocs);
MPI_Finalize();
}
Compile it using the command
% mpicc –o hellompi hellompi.c
Using your favorite editor, create a script file using the following as a guideline, and name the script ‘testrun.sh’. Make sure you add your SUNetID and directory path.
#!/bin/tcsh
#PBS -N TestJob
#PBS -l nodes=1:ppn=8
#PBS -q default
#PBS -V
#PBS -m e
## NOTE: the following line is not used on the '2012' Cluster!!
#PBS -W x="PARTITION:sw121"
#PBS -M <YOUR SUNETID>@stanford.edu
#PBS -e /data/cees//test.err
#PBS -o /data/cees//test.out
#
cd $PBS_O_WORKDIR
#
mpirun hellompi >> OUT
# end script

Description of script:
Line 1: shell to execute under
Line 2: name of job. This is displayed in showq.
Line 3: nodes and cores. This line requests 8 cores on one node (note that each of our compute nodes has 8 cores, so this command requests a 'whole' node).
Line 4: what queue to run on. Make sure you have access. Here we are using the default queue. The default queue contains all the nodes in the cluster, and has a 2 hour time limit for jobs.
Line 5 and 6: Use these. See the manual for description.
Line 7: Important – you must use this line to specify the network partition on the '2010' Cluster, but not the '2012' Cluster. See above.
Line 8: Maui will email you at this address when the job is done
Line 9 and 10: path names for the error log and output log. Important – this cannot be your home directory. The /home partition is mounted read only on the compute nodes, and trying to use it will cause this script to fail.
Line 11: comment line
Line 12: 'cd' to the working directory (the directory you are submitting the job from)
Line 13: comment line
Line 14: Command to execute, plus any options. This line will vary according to the application. Most programs will not have to specify the number of slots and the hostfile to mpirun. However, if you receive an error saying something about 'found only 1 processor', try using the following format:
mpirun -np [# of procs] -machinefile $PBS_NODEFILE [program and input] >> OUT
Note that the # of procs must agree with the '-l' line.
Example: mpirun -np 8 -machinefile $PBS_NODEFILE hellompi >> OUT
Line 15: end of script

Now submit the job:
% qsub testrun.sh
After a while, PBS will send you email that the job is finished. The test.err file will be empty if everything went correctly, but will contain useful debugging info if it did not.

Please email Dennis Michael if you have any difficulties, questions or problems.