SGE JOB history
/var/lib/gridengine/default/common$ tail -f accounting
qsub sge-root/examples/jobs/simple.sh
After the job finishes executing, check your home directory for the redirected stdout/stderr files script-name.ejob-id and script-name.ojob-id.
job-id is a consecutive unique integer number assigned to each job.
/usr/lib/lsb/remove_initd /etc/init.d/sgeexecd.cluster
./install_qmaster -m -noremote -auto util/install_modules/sge_configuration.conf
Default location of Sun Grid Engine log files:
<qmaster_spool_dir>/messages <qmaster_spool_dir>/schedd/messages <execd_spool_dir>/<hostname>/messages <sge_root>/<sge_cell>/common/accounting <sge_root>/<sge_cell>/common/statistics
add user to submit queue
$qconf -sul
$qconf -su ceshi
name ceshi
type ACL DEPT
fshare 0
oticket 0
entries ceshi,ano
fshare 0
oticket 0
entries ceshi,ano
$qconf -as master.local
$qconf -shgrp @ceshi >hosts4group
add an entry to hosts4group
$qconf -Mhgrp hosts4group
$qconf -se execd2 >execd.txt
modify execd.txt
$qconf -Me execd.txt
$qconf -sq ceshi.q > queue.txt
$qconf -Mq queue.txt
Adding an execution host
- Make the new host an administrative host
qconf -ah <hostname> - As root on this new host, run the following script from $SGE_ROOT
install_execd
Removing an execution host
- First, delete the queues associated with this host
- qconf -sql -> list show a list of all queues
以下操作都是在管理节点操作。
1.qconf -mq ceshi.q 去掉要删除节点的信息3. /etc/hosts去掉要删除节点的信息 - Delete the host
qconf -mhgrp @ceshi to remove entry for <hostname>qconf -de <hostname>
qonf -dh <host>
- Finally, delete the configuration for the host
qconf -dconf <hostname>
Install On Master
sudo apt-get install gridengine-client gridengine-common gridengine-master gridengine-qmon sun-java6-jre
- The java is needed for it to run.
- In installer say not to send email
- set cell name to default
- It is important to set the master name to be your actual outside name, to find it, type
hostname -I
#this gives you an IP, then type
host thisIP
#this gives you an IP, then type
host thisIP
Then give the fully qualified name of your machine as host.
Install On Workers
sudo apt-get install gridengine-exec gridengine-client
Starting on Master
Try starting it inside a screen session with
sudo su
sge_qmaster
sge_execd
qmon
sge_qmaster
sge_execd
qmon
It will probably fail with some message like
$ sudo qmon
Warning: Cannot convert string "-adobe-courier-medium-r-*--14-*-*-*-m-*-*-*" to type FontStruct
Warning: Cannot convert string "-adobe-courier-bold-r-*--14-*-*-*-m-*-*-*" to type FontStruct
Warning: Cannot convert string "-adobe-courier-medium-r-*--12-*-*-*-m-*-*-*" to type FontStruct
X Error of failed request: BadName (named color or font does not exist)
Major opcode of failed request: 45 (X_OpenFont)
Serial number of failed request: 643
Current serial number in output stream: 654
Warning: Cannot convert string "-adobe-courier-medium-r-*--14-*-*-*-m-*-*-*" to type FontStruct
Warning: Cannot convert string "-adobe-courier-bold-r-*--14-*-*-*-m-*-*-*" to type FontStruct
Warning: Cannot convert string "-adobe-courier-medium-r-*--12-*-*-*-m-*-*-*" to type FontStruct
X Error of failed request: BadName (named color or font does not exist)
Major opcode of failed request: 45 (X_OpenFont)
Serial number of failed request: 643
Current serial number in output stream: 654
Then install fonts:
sudo apt-get install xfonts-base xfonts-100dpi xfonts-75dpi
Then restart computer (no really!)
If you run into problems that say you cannot connect, make sure you don't have another sge process already running and that you have the fully qualified hostname set correctly
To kill possibly running instances:
ps aux | grep "sge"
To reconfigure the already installed after changing your hostname (for ubuntu in /etc/hostname) to match the one you got from running the above hostcommand:
sudo dpkg-reconfigure gridengine-master
# or to purge/remove with config files the whole install and try over clean:
sudo apt-get purge gridengine-common gridengine-exec gridengine-client
# or to purge/remove with config files the whole install and try over clean:
sudo apt-get purge gridengine-common gridengine-exec gridengine-client
Then follow part 2 of http://scidom.wordpress.com/tag/parallel/ to use the GUI to set up your mainqueue and try their simple hello world example.
Other useful resources are: http://helms-deep.cable.nu/~rwh/blog/?p=159
sudo apt-get install gridengine-client gridengine-common gridengine-master
Creating config file /etc/default/gridengine with new version
Setting up gridengine-client (6.2u5-7.3) ...
Setting up gridengine-master (6.2u5-7.3) ...
Initializing cluster with the following parameters:
=> SGE_ROOT: /var/lib/gridengine
=> SGE_CELL: default
=> Spool directory: /var/spool/gridengine/spooldb
=> Initial manager user: sgeadmin
Initializing spool (/var/spool/gridengine/spooldb)
Initializing global configuration based on /usr/share/gridengine/default-configuration
Initializing complexes based on /usr/share/gridengine/centry
Initializing usersets based on /usr/share/gridengine/usersets
Adding user sgeadmin as a manager
Cluster creation complete
Setting up gridengine-client (6.2u5-7.3) ...
Setting up gridengine-master (6.2u5-7.3) ...
Initializing cluster with the following parameters:
=> SGE_ROOT: /var/lib/gridengine
=> SGE_CELL: default
=> Spool directory: /var/spool/gridengine/spooldb
=> Initial manager user: sgeadmin
Initializing spool (/var/spool/gridengine/spooldb)
Initializing global configuration based on /usr/share/gridengine/default-configuration
Initializing complexes based on /usr/share/gridengine/centry
Initializing usersets based on /usr/share/gridengine/usersets
Adding user sgeadmin as a manager
Cluster creation complete
No comments:
Post a Comment