Stanford University
Search  |   People  |   Calendar  |   Computing  |   Internal Resources  |   Home  
School of Earth Sciences home
School of Earth Sciences home
 

Users

Getting started with the CEES Grid (pdf format)

Feb, 2007

CEES HPTC Manager: Dennis Michael, dennis@stanford.edu , 723-2014, Mitchell Building room 415.

Account requests, software requests and problem reports can be made via the forms on the drop down menu on 'HPTC Facilities'.  

The CEES Grid is accessed via the cluster head nodes described below.    You will use your SUNetID and your SUNetID password to log in.

Using the clusters in the CEES Grid is relatively simple once a few basic commands are learned.   The CEES Grid uses a batch processing system.   You use a cluster in the CEES Grid by logging in to the head node for that cluster, compiling your code, and then submitting the code in a script to the Grid Engine system.   The Grid Engine schedules your job for execution on the appropriate compute nodes, and on completion of the job, writes the results in your directory or a location of your choice.

Examples of how to actually run a job are given in the 'examples' section.   Your account has been set up with the environment you will need to get started.

Overview and Definitions

The Grid Engine is the software we use on the Grid for job scheduling and resource management. It is the Sun N1 Grid Engine 6, referred to as SGE.

A grid is a collection of computing resources that perform tasks and appears to users as a large system that provides single points of access to distributed resources or clusters of resources.

A cluster is a collection of individual computers, a network connecting those computers, and software that enables a computer to share work among the other computers via the network.

The Clusters in the CEES Grid

The CEES Grid is composed of three resource clusters, each with a separate head node and separate compute nodes.

CEES Sparc Cluster

Operating System: Solaris 10

Head node: cees-sparc.stanford.edu

  • Hardware: Sun 490 with 4 dual-core Sparc IV CPUs, and 16GB of memory

Number of compute nodes in cluster: 1

  • Compute node hardware: Sun 6900 with 24 dual-core Sparc IV CPUs and 192 GB of memory

Software:

  • Sun Studio 11 compilers and tools. Sun Studio 10 is also available.
  • Some GNU and open source programs
  • Sun HPC tools

The Sparc Cluster is best suited for SMP programs.

CEES Opteron Cluster

Operating System: CentOS Linux (same as Red Hat 4 AS)

Head node: cees-opteron.stanford.edu

  • Hardware: Sun v40z with 4 dual-core Opteron CPUs and 16GB of memory

Number of compute nodes: 64

  • 32 Sun v20z with 2 Opteron CPUs and 2 GB of memory
  • 32 Sun V20z with 2 Opteron CPUs and 8 GB of memory

Software:

  • GNU and open source software and compilers
  • Pathscale FORTRAN compiler (/usr/local/pathscale)
  • MPICH (/usr/local/mpich)

The Opteron Cluster is best suited for distributed programs using MPI.

CEES Tool cluster

Operating System: CentOS linux (same as Red Hat 4 AS)

Head node: cees-tool.stanford.edu

  • Sun v40z with 4 Opteron dual-core CPUs and 32GB of memory

Number of compute nodes: the head node is the compute node

Software:

  • GNU and open source software and compilers
  • Pathscale FORTRAN compiler (/usr/local/pathscale)
  • MATLAB (10 licenses), Comsol (1 license), IDL, SEADAS
  • AFS

Usage:

  • MATLAB and Comsol
  • Programs that do not use MPI or SMP
  • Users do not submit jobs via SGE. Jobs are run from the command line.

Logging in to the Cluster head nodes

  • You must use SSH or SFTP
  • You must be logging in from a Stanford host. If you wish to login from outside Stanford, please contact dennis@stanford.edu for firewall information
  • If you are using SecureCRT to connect to cees-sparc from a Windows machine and you have difficulty in connecting, try using Keyboard Interactive as the first authentication method.

Disk Space

The Opteron Cluster and the Sparc Cluster have separate home directory partitions. The Tool Cluster mounts both home directory partitions.

  • Both home directory partitions are 285GB
  • The home directory partitions are RAID 1 (mirrored disks)
  • The home directories are mounted on the head node and the compute nodes
  • There are no quotas
  • There are no backups

In addition to the home directories, there are several data directories available.

  • The data partitions are 2TB
  • The data partitions are RAID 0 (stripped, not mirrored)
  • The data partitions are mounted in /data on the head nodes and the compute nodes
  • There are no quotas
  • There are no backups

Neither the home directories nor the temp partitions are designed for permanent storage. We do not back up the data, and periodically files that have not been accessed for a significant period of time will be removed.   Please clean up your temporary files, and download your results as soon as possible. Remember that deleted files are not recoverable.

Please note the relative sizes of /home and the temp partitions. Use /home for source code; use the temp partitions for binaries, data sets and results.  

Software Environment

Your account has been created with several default paths for system software. If you alter your .tcshrc, please retain the paths and commands. If you make a mistake and delete something you shouldn't in your .tcshrc, the default files are located in /etc/skel on the head nodes.

Please note that the paths are different between the Linux and the Solaris environments.  If you can't find a program or you run the wrong version (gcc versus cc, for example), check your path and make sure the order is correct.

We have the GNU compilers and many open source programs available on the clusters. You can request new packages via the software request form on the web page.

If you have problems with the clusters, please report them via the web page.

Documentation

Linux

Solaris 10 and Sun Studio

MPICH

  • Various documents can be found in /usr/local/mpich/doc

Sun Grid Engine

  • For a description of the commands and options, type man qsub on the command line
  • User's manuals are located in /usr/local/DOC on the head nodes

Various other software

  • Look in the install directory (usually /usr or /usr/local)

Basic Commands and Examples

Examples and documents about SGE are located in /usr/n1ge6 under 'examples' and 'doc' on the head nodes. Information about MPICH can be found in /usr/local/MPICH.

All jobs are submitted to SGE via a job script which contains various definitions (such as a parallel environment), options, and the location of your binary executable program.

Some basic SGE commands (see the man page for more options):

qsub - submits a job script to the Grid Engine. After submission, the Grid Engine will execute the job on the cluster and return the results to you.   Examples of job scripts are below.

qstat - checks status of jobs.

qdel - deletes a job. Takes the job number as the argument.

There are two ways of giving options to SGE, either on the command line to `qsub' as an option, or in the job script on a line starting with '#$'. See the examples below.

By default, SGE puts the standard output and standard error in your working directory under the names

            <job name>.o<job id>

            <job name>.e<job id>

where <job id> is a unique number used by SGE to identify your job. This location can be changed via an option to qsub or in the job script.

Some useful options to qsub (these options are used in the example scripts below):

  • qsub -V passes your entire environment to SGE. This is useful when you have set a specific LD_LIBRARY_PATH, or you want to use an tcsh environmental variable in your job script.
  • qsub -o will change the output path for standard output
  • qsub -e will change the output path for standard error
  • qsub -cwd tells SGE to use the current working directory
  • qsub -q tells SGE which queue to use for execution

Job Queues in the CEES Grid

The execution hosts in the Opteron Cluster and the Sparc Cluster are organized into queues. The Tool Cluster does not have any queues since it is a standalone host.

The Sparc Cluster has a single execution node, and has a single queue.   The queue name is all.q . Since all.q is the default, specifying a queue when submitting a job is optional. There are also two aliases to this queue - petromod and cre - used for various applications that need a default name.

The 64 nodes in the Opteron Cluster are organized into two queues:

  • 2GB.q
    • Memory per node: 2GB
    • Number of nodes: 32
    • Available CPUs (slots): 64 (2 per node)
    • Nodes: cees-node-001 to cees-node-032
    • Parallel environment name: mpich2
  • 8GB.q
    • Memory per node: 8GB
    • Number of nodes: 32
    • Available CPUs (slots): 64 (2 per node)
    • Nodes: cees-node-033 to cees-node-064
    • Parallel environment name: mpich8

There is a special option for SMP jobs in the Opteron Cluster. If your program needs to run on both CPUs of a node (SMP), add -l smp=2 to the -pe line in your job script. This will allocate both CPUs on the node for your job.   Note, however, that this will reduce the number of available slots in the queue by half. Make sure that your -pe line in this case requests the number of nodes you need, not the number of slots.

Parallel Environments

You must specify a parallel environment to use MPI in the CEES Clusters.   This is most easily done in the job script on the '-pe' line (see Example 3). You also must tell SGE how many slots (cpus) you want your job to run on and which queue to use. Please note that the number of slots you request for the parallel environment must not exceed the number of slots available on the queue. An example which requests the mpi2 environment on the 2GB.q and 16 execution nodes:

            -pe mpich 16

Please note that the two queues in the Opteron Cluster, 2GB.q and 8GB.q, have different parallel environment names.

The Sparc Cluster has a single parallel environment named mpich.

The mpich parallel environment in the 2GB.q on the Opteron Cluster uses a 'round robin' algorithm. When allocating jobs to the nodes, mpich will put a job on one cpu of the first node, then the second job will be put on a cpu on the second node. When all nodes have jobs, SGE will then start back on the first node and put a job on the second cpu.  Example:

           

            Process 1: cpu1 of node1

            Process 2: cpu1 of node2

            Process 3: cpu1 of node3

                        ...

            Process 64: cpu1 of node64

            Process 65: cpu2 of node1

            Process 66: cpu2 of node2

            Process 67: cpu2 of node3

                        Etc

The mpich2 parallel environment in the 2GB.q on the Opteron Cluster uses a 'fill first' algorithm. When submitting jobs to the nodes, mpich2 will put a job on the first cpu of the first node, then the second job will be put on the second cpu of the first node. Each node will be 'filled' before the next node is used. Example:

            Process 1: cpu1 of node1

            Process 2: cpu2 of node1

            Process 3: cpu1 of node2

            Process 4: cpu2 of node2

                        Etc.

The mpich8 parallel environment in the 8GB.q on the Opteron Cluster uses a 'fill first' algorithm. There is no 'round robin'.

The mpich parallel environment in the Sparc Cluster uses a 'round robin'.

Example 1) A 'simple' example:

If you don't already have an account on the CEES Grid, click on account request in the menu under 'HPTC Facilities' on the left. Fill out the form, and wait for an email letting you know that your account is ready.

You will use your SUNetID and SUNetID password to logon.

Now we will copy a sample SGE job script into your home directory, do some editing, create a data directory for execution and results, submit the job script to SGE, and then look at the results. We will modify the job script slightly and run it again.

% cp /usr/n1ge6/examples/jobs/simple.sh .

<edit if necessary>

% ls /data

<pick which temp directory you will use - temp0, temp1, or temp2>

% cd /data/temp#

<think about what you want to call the directory. Many use their SUNetID>

% mkdir <your dir name>

% cd <your dir name>

% cp ~/simple.sh .

% qsub -q 2GB.q simple.sh ( or 'qsub simple.sh' on the Sparc Cluster)

% qstat

The job will take approximately 20-30 seconds to run.

The jobname is the name of the job script you submit to SGE, simple.sh in this case.

When you don't see your job listed via qstat any more, look in your data directory for two files with the format

     simple.sh.e{some number} and simple.sh.o{same number}.

Standard out goes to <jobname>.o<jobnumber> in your working directory, in this case simple.sh.o<job number>

Standard error goes to <jobname>.e<jobnumber> in your working directory, in this case simple.sh.e<job number>

The job number will change with each submitted job.

% cat simple.sh.o{job number}

You should see two times listed, the second 20 seconds later than the first.

Now edit the simple.sh script. Change one of the lines to do an 'ls', run it, and look at the output.

Feel free to run the other examples in /usr/n1ge6/examples/jobs, but note that 'simpleio' will not work in our environment.

If you have a program you want to run, try running it by compiling it, then substituting the 'date' lines in simple.sh with the name of the program.

Try this example on the Opteron Cluster and the Sparc Cluster.

Example 2) 'hello world'

Using your favorite editor, create a program file using the following code, and name the file 'hello.c'.

#include <stdio.h>

int main () {

                        printf ("Hello world!\n");

                        return 0;

}

Compile it using the command

            % cc -o hello hello.c

Using your favorite editor, create a script file using the following code, and name the script 'testrun.sh'.

#!/bin/tcsh

#

#$ -S /bin/tcsh

#

#$ -cwd

#

date

hello

# end of script

Copy your executable and script to your data directory, and submit to the SGE.

On the Opteron Cluster:

            % qsub -V -q 2GB.q testrun.sh

On the Sparc Cluster:

            % qsub -V testrun.sh

Example 3) a MPI   version of 'hello world'

Using your favorite editor, create a program file using the following code, and name the file 'hellompi.c'.

#include <stdio.h>

#include <mpi.h>

int main(int argc, char * argv[])

{

                          int myid, numprocs;

                          MPI_Init(&argc,&argv);

                          MPI_Comm_size(MPI_COMM_WORLD, &numprocs);

                          MPI_Comm_rank (MPI_COMM_WORLD,&myid);

                        printf("Hello World! I am #%d of %d procs\n",myid,numprocs);

                        MPI_Finalize();

}

Compile it using   the command

            % mpicc -o hellompi hellompi.c

Using your favorite editor, create a script file using the following code, and name the script 'testrun2.sh'. Note the additional options added to the script, and the options you will need to change.

#!/bin/tcsh

#

# use tcsh for scripts

#$ -S /bin/tcsh

# set up the parallel environment and run on 16 nodes

#$ -pe mpich 16

# put standard error results in this directory - create directory or change

#$ -e /data/temp#/<YOUR NAME>/Results

# put standard output results in this directory - create directory or change

#$ -o /data/temp#/<YOUR NAME>/Results

# use current working directory

#$ -cwd

#

# variable NSLOTS is set by SGE

# variable TMPDIR is set by SGE

# the machines file is created by SGE

#

mpirun -np $NSLOTS -machinefile $TMPDIR/machines hellompi

# end of script

Submit the script on cees-opteron by using the command

            % qsub -q 2GB.q testrun2.sh

To run the same job on the 8GB.q, edit the testrun2.sh script to change 'mpich' to 'mpich8', and submit the job with the command:

            %qsub -q 8GB.q testrun2.sh

 

 

CEES Home  |  Users  |  People  |  Research  |  Education  |  HPTC Facilities  
RAs & Fellowships  |  News & Publications  |  Map & Directions  |  Contact Us

 

  Last modified Friday, 22-Jun-2007 09:01:02 PDT
Please contact the Webmaster with suggestions or comments.