I have a question about.... Who should I ask about ...?
Use the form under HPTC Facilities called 'ask a question'. You will be contacted by the appropriate staff member. You can also contact the HPTC Manager in room 415 of the Mitchell Building.
How do I login?
From a UNIX or Mac, use SSH (version 2) to one of the head nodes: cees-cluster.stanford.edu, cees-opteron.stanford.edu, or cees-tool.stanford.edu. On a Windows machine, use SecureCRT.
How do I download/upload files or data sets?
From a Windows box, use SecureFX. From a MAC or UNIX box, use SCP. Generally you should copy source code to your home directory, and data sets to the /data partitions. See the doc 'Getting Started' for more details. There are also large external disks (~400GB) that you can potentially use. In this case, you would hook the disk up to your machine, transfer your files to the disk, then take the disk to the CEES Lab and hook it up to a machine there that is connected via a high-speed network to the CEES servers, and then transfer the files. To download, you would reverse the process. Contact the HPTC Manager for details.
Where is the documentation?
Look in "Getting Started with the CEES Grid", or for:
What are the limits for job time and number of processors?
The development cluster (currently the cees-opteron.stanford.edu nodes ) will be older hardware will be limited to short runs of one hour during the 8am-10pm period. From 10pm-8am longer jobs of up to 3 hours may be submitted.
Each PI, and their designates, will be provided with a queue associated with nodes equal to their CP. They will have guaranteed highest priority access to these nodes within 3 hours (current proposal is 5 hours). In addition they will have access to all other queues on the cluster. Any job submitted to other PIs queues may not exceed the 3 hour limit without prior approval (jobs exceeding this limit will be killed). The priority of jobs submitted to other PIs queues will depend on the PIs recent usage. Every month CEES will calculate the total number of hours where PIs have not used their own resources. We will average this non-use over a three-month period. This number will be used to calculate the "free" resources available in a given month. Each PI will be given highest priority hours equal to the free resources*CP. After they have reached this limit they will drop to the lowest priority in all queues except their own. A PI's queue status on the production cluster will be mirrored on the development cluster for longer night runs.
Do I need to budget my cycles?
No. We do accounting, but it is mainly to see what groups are using the resources.
What OS is running?
On the Opteron Custer and the Tool Cluster CentOS is run, which is a Red Hat 'clone' that is completely compatible with Red Hat Linux.
What compilers are available?
On the Linux machines in the Opteron Cluster and the Tool Cluster, we have the GNU compilers (gcc, etc). We also have the Pathscale FORTRAN compiler (/usr/local/pathscale).
I need some software installed.
Go to http://cees.stanford.edu and click on the HPTC tab on the left, and fill out the form under 'change/software request'. Generally useful public domain software will be installed in a day or two. The installation of software that costs money will be delayed until the funding source is identified.
How do I submit an MPI job?
Mpirun is disabled for direct execution. See the document "Getting Started with the CEES Grid" on our web site at http:cees.stanford.edu/ .
How can I recover a deleted file?
How do I setup remote X11?
If you wish to run programs on the CEES machines, but display the outputs via X11 on your workstation, follow these descriptions for the type of workstation you have.
Linux/Solaris: login to the CEES machine using 'ssh -X '. The '-X' turns on the X11 forwarding.
MAC: open a terminal and login to the CEES machine using 'ssh -X '. If that doesn't work, try 'ssh -X -Y '.
Windows: this is much more complicated. You first need to run a X server on your Windows box before you login to the CEES machine using SecureCRT.
- Download a X11 server from http://sourceforge.net/projects/xming . Get the files xming-mesa and xming-fonts, and install them on your Windows machine. Run the server (it may start up as part of the install).
- Start SecureCRT. In the sessions listing, right click on the CEES machine name, go to properties, and under 'Remote/X11', click the 'Forward X11 packets'. Click ok and then connect to the server.
- Run the program as normal. When the program starts, it will open a window on your Windows box.
How can I improve the performance of my program?
Contact Bob Clapp
What libraries are available?
We have a lot of stuff compiled. Beside the usual places, check out the libs in /usr/local. If you can't find it, go to the web site and fill out the 'ask a question' form.
Where should I write my output files?
Please direct program output to the /data directories, not to your home directory. There's more space available in /data than in /home.
Can I mount the CEES disk partitions via SAMBA or NFS to my machine?
No, we do not provide that service.
Permission denied error when running MPI - Mpirun is disabled for direct execution. See the document "Getting Started with the CEES Grid" at http://cees.stanford.edu/.
I try to connect to one of the head nodes, but my machine can't connect to it - You must connect from a Stanford host to login to the CEES Grid. Our firewall blocks non-Stanford hosts. It might be possible to poke a hole in the firewall for your machine. Go to http://cees.stanford.edu, click on the 'HPTC Facilities' tab on the left, and fill out the form under 'ask a question' describing where you are connecting from and the IP address of your machine, or contact the HPTC Manager directly. Alternatively you can first login to a Stanford machine such as pangea.stanford.edu, then login to the CEES head nodes. You can also login to one of the Sweet Hall machines, then login to CEES.
My program gets a segmentation fault - This is usually caused by a bad pointer in your program. Protection faults are logged in system messages. You can check system messages by running 'dmesg'. You may see a line that says something like: pw.x general protection rip:2a9581cbe5 rsp:7fbffff200 error:0 where 'pw.x' is the name of the program giving the fault. The Linux kernel logs unhandled user signals, and the x86-64 architecture causes a general protection fault for "non canonical pointers". These are the same as a normal segfault, but mapped to a different exception.
I compile my program, but the libraries don't link - Make sure you are linking the appropriate libraries and you are not mixing 64 bit binaries with 32 bit binaries.
I'm using -cwd in my SGE script, but it doesn't seem to work - Check to see if you are using the 'dirsfile' option in your .cshrc. That option will disable the cwd option in SGE.
When I run my program, NFS seems to slow down - Check to see how many files your program is opening. Try to minimize the number of files you have open in the NFS partitions (/home and /data).