Sunday, June 5, 2016

SGE

SGE JOB history
/var/lib/gridengine/default/common$ tail -f accounting



qsub sge-root/examples/jobs/simple.sh
After the job finishes executing, check your home directory for the redirected stdout/stderr files script-name.ejob-id and script-name.ojob-id.
job-id is a consecutive unique integer number assigned to each job.


/usr/lib/lsb/remove_initd /etc/init.d/sgeexecd.cluster

./install_qmaster -m -noremote -auto util/install_modules/sge_configuration.conf

Default location of Sun Grid Engine log files:
<qmaster_spool_dir>/messages
<qmaster_spool_dir>/schedd/messages
<execd_spool_dir>/<hostname>/messages
<sge_root>/<sge_cell>/common/accounting
<sge_root>/<sge_cell>/common/statistics
add user to submit queue
$qconf -sul
$qconf -su ceshi
name    ceshi
type    ACL DEPT
fshare  0
oticket 0
entries ceshi,ano

$qconf -as master.local

$qconf -shgrp @ceshi >hosts4group
add an entry to hosts4group
$qconf -Mhgrp hosts4group
$qconf -se execd2 >execd.txt
modify execd.txt
$qconf -Me execd.txt

$qconf -sq ceshi.q > queue.txt
$qconf -Mq queue.txt



Adding an execution host
  • Make the new host an administrative host

    qconf -ah <hostname>
  • As root on this new host, run the following script from $SGE_ROOT

    install_execd

Removing an execution host
  • First, delete the queues associated with this host
  •     qconf -sql  -> list show a list of all queues    

          以下操作都是在管理节点操作。
    1.qconf -mq ceshi.q  去掉要删除节点的信息

    3. /etc/hosts去掉要删除节点的信息           
  • Delete the host
    qconf -mhgrp @ceshi    to remove entry for <hostname>
    qconf -de <hostname>
     qonf -dh <host>
  • Finally, delete the configuration for the host

    qconf -dconf <hostname>








Install On Master

sudo apt-get install gridengine-client gridengine-common gridengine-master gridengine-qmon sun-java6-jre
  1. The java is needed for it to run.
  2. In installer say not to send email
  3. set cell name to default
  4. It is important to set the master name to be your actual outside name, to find it, type
hostname -I
#this gives you an IP, then type
host thisIP
Then give the fully qualified name of your machine as host.

Install On Workers

sudo apt-get install gridengine-exec gridengine-client

Starting on Master

Try starting it inside a screen session with
sudo su
sge_qmaster
sge_execd
qmon
It will probably fail with some message like
$ sudo qmon
Warning: Cannot convert string "-adobe-courier-medium-r-*--14-*-*-*-m-*-*-*" to type FontStruct
Warning: Cannot convert string "-adobe-courier-bold-r-*--14-*-*-*-m-*-*-*" to type FontStruct
Warning: Cannot convert string "-adobe-courier-medium-r-*--12-*-*-*-m-*-*-*" to type FontStruct
X Error of failed request: BadName (named color or font does not exist)
Major opcode of failed request: 45 (X_OpenFont)
Serial number of failed request: 643
Current serial number in output stream: 654
Then install fonts:
sudo apt-get install xfonts-base xfonts-100dpi xfonts-75dpi
Then restart computer (no really!)
If you run into problems that say you cannot connect, make sure you don't have another sge process already running and that you have the fully qualified hostname set correctly
To kill possibly running instances:
ps aux | grep "sge"
To reconfigure the already installed after changing your hostname (for ubuntu in /etc/hostname) to match the one you got from running the above hostcommand:
sudo dpkg-reconfigure gridengine-master
# or to purge/remove with config files the whole install and try over clean:
sudo apt-get purge gridengine-common gridengine-exec gridengine-client
Then follow part 2 of http://scidom.wordpress.com/tag/parallel/ to use the GUI to set up your mainqueue and try their simple hello world example.
Other useful resources are: http://helms-deep.cable.nu/~rwh/blog/?p=159



sudo apt-get install gridengine-client gridengine-common gridengine-master

Creating config file /etc/default/gridengine with new version
Setting up gridengine-client (6.2u5-7.3) ...
Setting up gridengine-master (6.2u5-7.3) ...
Initializing cluster with the following parameters:
 => SGE_ROOT: /var/lib/gridengine
 => SGE_CELL: default
 => Spool directory: /var/spool/gridengine/spooldb
 => Initial manager user: sgeadmin
Initializing spool (/var/spool/gridengine/spooldb)
Initializing global configuration based on /usr/share/gridengine/default-configuration
Initializing complexes based on /usr/share/gridengine/centry
Initializing usersets based on /usr/share/gridengine/usersets
Adding user sgeadmin as a manager
Cluster creation complete

No comments:

Post a Comment