PostgresHowto

From ClusterLabs

This article explains how to set up (and monitor) an Active/Passive PostgreSQL Cluster, using Pacemaker with Corosync and DRBD. Prepared by Rafael Marangoni, from the BRLink Servidor Linux Team.

Introduction

We use two nodes, one active (that answers requests from apps) and the other on passive mode. If the active server is down, the passive one will automatically take its position (being the active node).


Preliminary Note

Linux Distribution:

We are using the CentOS 5.5 (64bits) distribution, but will probably work on Fedora (and Red Hat, for sure). The installation of the CentOS is very simple and classical, select the base packages and other stuff that you like/need. One issue that must be remembered is that we use DRBD to replicate the PostgreSQL data between the nodes, then you'll need to have a disk or partition exclusive to DRBD. Remember this before partitioning disks on CentOS installation.


Network Hardware/Topology:

We use two Gigabit NIC's per node, one (eth0) connect to the network (LAN), and the other one (eth1) with a cross-over cable connecting both nodes. The cross-over cable must be used to improve performance and confiability of the system, because DRBD won't depends of network switchs or anything else to replicate data between the nodes. In this tutorial we will use the physical nodes node1.clusterbr.int and node2.clusterbr.int:

  • node1.clusterbr.int: Uses IP 10.0.0.191 (LAN) and IP 172.16.0.1 (cross-over)
  • node2.clusterbr.int: Uses IP 10.0.0.192 (LAN) and IP 172.16.0.2 (cross-over)
  • dbip.clusterbr.int: It's the Cluster IP, 10.0.0.190. This is the IP that all the applications need to be pointed to, to access the PostgreSQL

Disks:

Both the nodes have two disks: /dev/sda: to system OS; /dev/sdb: to DRBD. As I said before, you can use only one disk, if leaving one partition exclusive to DRBD.


PostgreSQL:

The version of the PostgreSQL used on this article is 8.4, but it doesn't really matters, because anything that you have inside the DRBD device, will be replicated through the cluster.


Preparing the Nodes

Disabling SELINUX

We need to disable the SELINUX:

vi /etc/selinux/config

Change this line only (leaving everything else untouched):

SELINUX=disabled

Setting Hostname

We need to change the hostname and gateway of the nodes:

vi /etc/sysconfig/network

node1:

NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=node1.clusterbr.int
GATEWAY=10.0.0.9

node2:

NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=node2.clusterbr.int
GATEWAY=10.0.0.9

Configuring network interfaces

Next, we will configure the network interfaces:

node1: The LAN interface

vi /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=static
IPADDR=10.0.0.191
NETMASK=255.255.255.0
ONBOOT=yes
HWADDR=a6:1e:3d:67:66:78 

The Cross-Over/DRBD interface

vi /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
BOOTPROTO=static
IPADDR=172.16.0.1
NETMASK=255.255.255.0
ONBOOT=yes
HWADDR=ee:ef:ff:9a:9a:57


node2: The LAN interface

vi /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=static
IPADDR=10.0.0.192
NETMASK=255.255.255.0
ONBOOT=yes
HWADDR=52:52:a1:1a:62:32

The Cross-Over/DRBD interface

vi /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
BOOTPROTO=static
IPADDR=172.16.0.2
NETMASK=255.255.255.0
ONBOOT=yes
HWADDR=1a:18:b2:50:96:1e

Setting DNS Configuration

Setting DNS configuration on both nodes (according to your network):

vi /etc/resolv.conf
search clusterbr.int
nameserver 10.0.0.9

Configuring basic hostname resolution

Configuring /etc/hosts (same config on both nodes):

vi /etc/hosts
127.0.0.1               localhost.localdomain localhost
10.0.0.191              node1.clusterbr.int   node1
10.0.0.192              node2.clusterbr.int   node2
10.0.0.190              dbip.clusterbr.int    node2

PS: You'll probably want to set another lines on this file, to point to other addresses of your network.

Checking network connectivity

Let's check if everything is fine:

node1: Pinging node2 (thru LAN interface)

[root@node1 ~]# ping -c 2 node2
PING node2 (10.0.0.192) 56(84) bytes of data.
64 bytes from node2 (10.0.0.192): icmp_seq=1 ttl=64 time=0.089 ms
64 bytes from node2 (10.0.0.192): icmp_seq=2 ttl=64 time=0.082 ms
--- node2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.082/0.085/0.089/0.009 ms 

Pinging node2 (thru cross-over interface)

[root@node1 ~]# ping -c 2 172.16.0.2
PING 172.16.0.2 (172.16.0.2) 56(84) bytes of data.
64 bytes from 172.16.0.2: icmp_seq=1 ttl=64 time=0.083 ms
64 bytes from 172.16.0.2: icmp_seq=2 ttl=64 time=0.083 ms
--- 172.16.0.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.083/0.083/0.083/0.000 ms 

node2:

Pinging node1 (thru LAN interface) 
[root@node2 ~]# ping -c 2 node1
PING node1 (10.0.0.191) 56(84) bytes of data.
64 bytes from node1 (10.0.0.191): icmp_seq=1 ttl=64 time=0.068 ms
64 bytes from node1 (10.0.0.191): icmp_seq=2 ttl=64 time=0.063 ms
--- node1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.063/0.065/0.068/0.008 ms

Pinging node1 (thru cross-over interface)

[root@node2 ~]# ping -c 2 172.16.0.1
PING 172.16.0.1 (172.16.0.1) 56(84) bytes of data.
64 bytes from 172.16.0.1: icmp_seq=1 ttl=64 time=1.36 ms
64 bytes from 172.16.0.1: icmp_seq=2 ttl=64 time=0.075 ms
--- 172.16.0.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.075/0.722/1.369/0.647 ms


Configuring Initialization options

I like to set runlevel to 3.

vi /etc/inittab

Change this line only (leaving everything else untouched):

id:3:initdefault:

I like remove some services from automatic initialization, to maintain only services that really will be used. These are the active services that we'll need:

[root@node1 ~]# chkconfig --list | grep 3:sim
acpid           0:não  1:não  2:sim   3:sim   4:sim   5:sim   6:não
anacron         0:não  1:não  2:sim   3:sim   4:sim   5:sim   6:não
apmd            0:não  1:não  2:sim   3:sim   4:sim   5:sim   6:não
atd             0:não  1:não  2:não  3:sim   4:sim   5:sim   6:não
cpuspeed        0:não  1:sim   2:sim   3:sim   4:sim   5:sim   6:não
crond           0:não  1:não  2:sim   3:sim   4:sim   5:sim   6:não
irqbalance      0:não  1:não  2:sim   3:sim   4:sim   5:sim   6:não
kudzu           0:não  1:não  2:não  3:sim   4:sim   5:sim   6:não
network         0:não  1:não  2:sim   3:sim   4:sim   5:sim   6:não
rawdevices      0:não  1:não  2:não  3:sim   4:sim   5:sim   6:não
sshd            0:não  1:não  2:sim   3:sim   4:sim   5:sim   6:não
syslog          0:não  1:não  2:sim   3:sim   4:sim   5:sim   6:não

PS: The services that will be managed by the Pacemaker (the Cluster Resource Manager - CRM), in this article they're Postgresql and DRBD, should not been on automatic initialization, because Pacemaker will start/stop these services.

At this point, we need to reboot both nodes to apply configuration.


3. Installing prerequisites and cluster packages

There are some packages that need to be installed before

yum install -y postgresql84** gcc perl-mailtools perl-dbi php-pgsql

To install the cluster packages, we'll need to add the EPEL repository:

rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm

This link points to the EPEL package to CentOS 5 64bits, be sure to make changes if this is not your distro/version.

Now, we install the ClusterLabs EPEL repository:

wget -O /etc/yum.repos.d/pacemaker.repo http://clusterlabs.org/rpm/epel-5/clusterlabs.repo

If evething is fine, just go ahead now, and install cluster and drbd packages:

yum install -y pacemaker corosync drbd83 kmod-drbd83 heartbeat


4. Configuring DRBD

First, we need to configure /etc/drbd.conf on both nodes:

vi /etc/drbd.conf 
global {
    usage-count no;
}
common {
    syncer { rate 100M; }
    protocol      C;
}
resource postgres {
    startup {
       wfc-timeout 0;
       degr-wfc-timeout
       120;
    }
    disk { on-io-error detach; }
    on node1.clusterbr.int {
       device      /dev/drbd0;
       disk        /dev/sdb;
       address     172.16.0.1:7791;
       meta-disk   internal;
    }
    on node2.clusterbr.int {
       device      /dev/drbd0;
       disk        /dev/sdb;
       address     172.16.0.2:7791;
       meta-disk   internal;
    }
}

The main points of configuration are: resource: refers to the resource that will be manage by DRBD, note that we called "postgres" disk: refers to the device that DRBD will use (a disk or partition) address: IP Address/port that DRBD will use (note that we points to cross-over interfaces) syncer: the rate transfer between the nodes (we use 100M because we have Gigabit cards) If you have doubts, please look at DRBD Users Guide: www.drbd.org/users-guide-emb/


Afterwards of that configuration, we can create the metadata on postgres resource. On both nodes do:

drbdadm create-md postgres

node1:

[root@node1 ~]# drbdadm create-md postgres
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created. 

node2:

[root@node2 ~]# drbdadm create-md postgres
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created. 

Next, we need to put the resource up, connecting to it. Again, on both nodes do:

drbdadm up postgres

Now we can make the initial sync between the nodes. This need to be done only on the primary node, here we choosed the node1. Then, only on node1:

drbdadm -- --overwrite-data-of-peer primary postgres

To check the progress of the sync, and status of DRBD resource, look at /proc/drbd:

cat /proc/drbd
[root@node1 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:09
0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
ns:48128 nr:0 dw:0 dr:48128 al:0 bm:2 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:8340188
[>....................] sync'ed:  0.6% (8144/8188)M delay_probe: 7
finish: 0:11:29 speed: 12,032 (12,032) K/sec 

Now, we need to wait the end of the sync. This may take a long time, depends the size and performance of your disks. And of course, the speed of cluster network interfaces that is being used with the cross-over cable).

When the sync process ends, we can take a look at the status of the resource postgres:

node1:

[root@node1 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:09
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:8388316 nr:0 dw:0 dr:8388316 al:0 bm:512 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0   

node2:

[root@node2 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:09
0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
ns:0 nr:8388316 dw:8388316 dr:0 al:0 bm:512 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0   

To learn what means all the status information, take a look at: www.drbd.org/users-guide-emb/ch-admin.html#s-proc-drbd


5. Configuring PostgreSQL

First, need to initiate the DRBD services, to create the initdb. On both nodes, do:

/etc/init.d/drbd start

As we chose before, node1 will be the primary. To make sure, on node1:

[root@node1 ~]# cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:09
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:8388316 nr:0 dw:0 dr:8388316 al:0 bm:512 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0   

The Primary/Secondary info, means that the local server is the Primary, and the other one is Secondary. The UpToDate/UpToDate info, means that the resource is UptoDate on both nodes.

Next, we need to format the DRBD device. Here we choosed ext3 as the filesystem. Only on node1, do:

mkfs.ext3 /dev/drbd0 

Afterwards, we can mount the device. The mountpoint that we use is the default PostgreSQL location (on Redhat based systems): Only on node1, do:

mount -t ext3 /dev/drbd0 /var/lib/pgsql

Next, we change the owner and group of the mountpoint. Only on node1, do:

chown postgres.postgres /var/lib/pgsql

Now, we need to initiate the postgresql database: Only on node1, do:

su - postgres 
initdb /var/lib/pgsql/data
exit

I prefer to enable trusted authentication on the nodes and cluster IP's. Only on node1, do:

echo "host  all   all   10.0.0.191/32   trust" >> /var/lib/pgsql/data/pg_hba.conf
echo "host  all   all   10.0.0.192/32   trust" >> /var/lib/pgsql/data/pg_hba.conf
echo "host  all   all   10.0.0.190/32   trust" >> /var/lib/pgsql/data/pg_hba.conf 

Other config that we need to do, it's to enable PostgreSQL to listen on all interfaces. Only on node1, do:

vi /var/lib/pgsql/data/postgresql.conf

Uncomment and change only the line:

listen_addresses = '0.0.0.0' 

Now, we start postgres. Only on node1, do:

/etc/init.d/postgresql start

Then, we can create an admin user to manage postgresql: Only on node1, do:

su - postgres
createuser --superuser admpgsql --pwprompt

You'll need to set a password to admpgsql.

Afterwards, we create a database and populate it with pgbench. Only on node1, do:

su - postgres
createdb pgbench
pgbench -i pgbench

Pgbench populates the db with some info, the objetive is to test postgresql/

-bash-3.2$ pgbench -i pgbench
NOTA:  tabela "pgbench_branches" não existe, ignorando
NOTA:  tabela "pgbench_tellers" não existe, ignorando
NOTA:  tabela "pgbench_accounts" não existe, ignorando
NOTA:  tabela "pgbench_history" não existe, ignorando
creating tables...
10000 tuples done.
20000 tuples done.
30000 tuples done.
40000 tuples done.
50000 tuples done.
60000 tuples done.
70000 tuples done.
80000 tuples done.
90000 tuples done.
100000 tuples done.
set primary key...
NOTA:  ALTER TABLE / ADD PRIMARY KEY criará índice implícito "pgbench_branches_pkey" na tabela "pgbench_branches"
NOTA:  ALTER TABLE / ADD PRIMARY KEY criará índice implícito "pgbench_tellers_pkey" na tabela "pgbench_tellers"
NOTA:  ALTER TABLE / ADD PRIMARY KEY criará índice implícito "pgbench_accounts_pkey" na tabela "pgbench_accounts"
vacuum...done.

Now, we'll access the database to check if everything is ok: Only on node1, do:

psql -U admpgsql -d pgbench
select * from pgbench_tellers;
psql -U admpgsql -d pgbench
psql (8.4.5)
Digite "help" para ajuda.

pgbench=# select * from pgbench_tellers;
tid | bid | tbalance | filler
-----+-----+----------+--------
1 |   1 |        0 |
2 |   1 |        0 |
3 |   1 |        0 |
4 |   1 |        0 |
5 |   1 |        0 |
6 |   1 |        0 |
7 |   1 |        0 |
8 |   1 |        0 |
9 |   1 |        0 |
10 |   1 |        0 |
(10 registros) 

Afterwards, all postgres config is done.

Checking if PostgreSQL will work on node2

Before we start to manage the services with Pacemaker, it's better to test if postgres will work on node2.

node1:

First, on node1, we need to stop postgresql:

/etc/init.d/postgresql stop

Then, we umount the DRBD device:

umount /dev/drbd0

Now, we need to put node1 as Secondary on DRBD resource:

drbdadm secondary postgres

node2:

First, on node2, we need to promote node2 as Primary on DRBD resource:

drbdadm primary postgres

Then, we mount the DRBD device:

mount -t ext3 /dev/drbd0 /var/lib/pgsql/

Finally, we start postgresql

 /etc/init.d/postgresql start

Let's check if we can access the pgbench db on node2:

psql -U admpgsql -d pgbench
select * from pgbench_tellers;
[root@node2 ~]# psql -U admpgsql -d pgbench
psql (8.4.5)
Digite "help" para ajuda.

pgbench=# select * from pgbench_tellers;
 tid | bid | tbalance | filler
-----+-----+----------+--------
   1 |   1 |        0 |
   2 |   1 |        0 |
   3 |   1 |        0 |
   4 |   1 |        0 |
   5 |   1 |        0 |
   6 |   1 |        0 |
   7 |   1 |        0 |
   8 |   1 |        0 |
   9 |   1 |        0 |
  10 |   1 |        0 |
(10 registros)     

Now, that everything is ok, we should stop all the services, to initiate the cluster config:

node2:

/etc/init.d/postgresql stop
umount /dev/drbd0
drbdadm secondary postgres
/etc/init.d/drbd stop

node1:

drbdadm primary postgres
/etc/init.d/drbd stop

We need to ensure that all the services are disabled at the initialization. On both nodes, do:

chkconfig --level 35 drbd off
chkconfig --level 35 postgresql off

6. Configuring Corosync (openAIS)

The Corosync project is a fork of the Heartbeat project, and like Pacemaker works very very fine with Corosync, we'll use it here.

node1: To configure Corosync, let's get the actual configuration: Only on node1, do:

export ais_port=4000
export ais_mcast=226.94.1.1
export ais_addr=`ip address show eth0 | grep "inet " | tail -n 1 | awk '{print $4}' | sed s/255/0/`

Then, we check the data:

env | grep ais_

Important: The variable ais_addr must contains the network address that the cluster will listen. In our article, this address is 10.0.0.0

Now we create the corosync config file:

cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
sed -i.gres "s/.*mcastaddr:.*/mcastaddr:\ $ais_mcast/g" /etc/corosync/corosync.conf
sed -i.gres "s/.*mcastport:.*/mcastport:\ $ais_port/g" /etc/corosync/corosync.conf
sed -i.gres "s/.*bindnetaddr:.*/bindnetaddr:\ $ais_addr/g" /etc/corosync/corosync.conf

Let's add some information to the file:

cat <<-END >>/etc/corosync/corosync.conf
aisexec {                                   
user: root                                      
group: root                                         
}                                                       
END                                                         

cat <<-END >>/etc/corosync/corosync.conf                        
service {                                                           
# Load the Pacemaker Cluster Resource Manager                           
name: pacemaker                                                             
ver: 0                                                                          
}                                                                                   
END

The /etc/corosync/corosync.conf file will looks like this:

compatibility: whitetank

totem {
   version: 2
   secauth: off
   threads: 0
   interface {
       ringnumber: 0
       bindnetaddr: 10.0.0.0
       mcastaddr: 226.94.1.1
       mcastport: 4000
   }
}

logging {
   fileline: off
   to_stderr: yes
   to_logfile: yes
   to_syslog: yes
   logfile: /tmp/corosync.log
   debug: off
   timestamp: on
   logger_subsys {
      subsys: AMF
      debug: off
   }
}

amf {
   mode: disabled
}
aisexec {
   user: root
   group: root
}
service {
   # Load the Pacemaker Cluster Resource Manager
   name: pacemaker
   ver: 0
}     

From node1, we'll transfer the corosync config files to node2:

scp /etc/corosync/* node2:/etc/corosync/

both nodes:

On both nodes, we need to create the logs directory:

mkdir /var/log/cluster/ 

node1: Afterwards, only on node1, start the corosync service:

/etc/init.d/corosync start

Let's check if the service is ok:

grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/messages
[root@node1 bin]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/messages
Apr  7 12:37:21 node1 corosync[23533]:   [MAIN  ] Corosync Cluster Engine ('1.2.0'): started and ready to provide service.
Apr  7 12:37:21 node1 corosync[23533]:   [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

Let's check if corosync started on the right interface:

grep TOTEM /var/log/messages
[root@node1 bin]# grep TOTEM /var/log/messages
Apr  7 12:37:21 node1 corosync[23533]:   [TOTEM ] Initializing transport (UDP/IP).
Apr  7 12:37:21 node1 corosync[23533]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Apr  7 12:37:21 node1 corosync[23533]:   [TOTEM ] The network interface [10.0.0.191] is now up.
Apr  7 12:37:21 node1 corosync[23533]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.

Let's check if pacemaker is up:

grep pcmk_startup /var/log/messages
[root@node1 bin]# grep pcmk_startup /var/log/messages
Apr  7 12:37:21 node1 corosync[23533]:   [pcmk  ] info: pcmk_startup: CRM: Initialized
Apr  7 12:37:21 node1 corosync[23533]:   [pcmk  ] Logging: Initialized pcmk_startup
Apr  7 12:37:21 node1 corosync[23533]:   [pcmk  ] info: pcmk_startup: Maximum core file size is: 4294967295
Apr  7 12:37:21 node1 corosync[23533]:   [pcmk  ] info: pcmk_startup: Service: 9
Apr  7 12:37:21 node1 corosync[23533]:   [pcmk  ] info: pcmk_startup: Local hostname: node1 

Let's check if the corosync process is up:

ps axf
[root@node1 bin]# ps axf
(should contain something like this)
23533 ?        Ssl    0:00 corosync
23539 ?        SLs    0:00  \_ /usr/lib/heartbeat/stonithd
23540 ?        S      0:00  \_ /usr/lib/heartbeat/cib
23541 ?        S      0:00  \_ /usr/lib/heartbeat/lrmd
23542 ?        S      0:00  \_ /usr/lib/heartbeat/attrd
23543 ?        S      0:00  \_ /usr/lib/heartbeat/pengine
23544 ?        S      0:00  \_ /usr/lib/heartbeat/crmd  


node2:

Afterwards, if everything is ok on node1, then we can bring corosync up on node2:
/etc/init.d/corosync start

both nodes: Now, we can check the status of the cluster. Running on any node, the following command:

crm_mon -1
[root@node1 ~]# crm_mon -1
============
Last updated: Fri Oct 29 17:44:36 2010
Stack: openais
Current DC: node1.clusterbr.int - partition with quorum
Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ node1.clusterbr.int node2.clusterbr.int ] 

Be sure that the both nodes are up and showing as online

Set Corosync to automatic initialization (both nodes):

chkconfig --level 35 corosync on


7. Configuring Pacemaker

Pacemaker has many good features, one of them is that it replicates automatically the cluster configuration between the nodes. So the administrative tasks (like configuration) made on any node, are applyed to the entire cluster. Then every crm command used on this article can be used on any node, but only on time (do not repeat the command on more than one node).

Important commands to cluster managment

To check cluster configuration:

crm_verify -L

To list cluster status and return to command prompt:

crm_mon -1

To list cluster status and maintain on the status screen:

crm_mon -1

To list cluster configuration:

crm configure show

To list open crm console (quit to exit):

crm


Configuring Stonith

Stonith is a security feature of the cluster, that (among other things) "strongly" shutdown a cluster node that has problems. To do this well, it uses specific hardware. The first thing we need to do on cluster configuration is to configure or disable Stonith. On this article, we'll disable Stonith, but you can use it. If you want to know how to use it, take a look at: http://www.clusterlabs.org/doc/crm_fencing.html

First, checking the cluster configuration, we should get some errors from Stonith

crm_verify -L

So, to disable Stonith, we use the following command (on one of the nodes):

crm configure property stonith-enabled=false

Now, checking the cluster configuration, we should get no errors:

crm_verify -L

Cluster General Configuration

Run the commands once, on any node.

Configuring quorum to 2 nodes. For more information, look at pacemaker configuration.

crm configure property no-quorum-policy=ignore

Configuring weight to change the resource to another node. When a node goes down and then goes up, this configuration makes the resource that is running on the another server be kept there (that was always up). This is very good to prevent a sync problem on the node that was down, or prevent that the node that is flapping, flap the cluster services.

crm configure rsc_defaults resource-stickiness=100

Showing configuration:

crm configure show
[root@node1 ~]# crm configure show
node node1.clusterbr.int
node node2.clusterbr.int
property $id="cib-bootstrap-options" \
        dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"


Configuring DBIP

We need a cluster IP. To add configure it, execute:

crm configure primitive DBIP ocf:heartbeat:IPaddr2 \ 
     params ip=10.0.0.190 cidr_netmask=24 \                                      
     op monitor interval=30s 

Showing status:

[root@node1 ~]# crm_mon -1
============
Last updated: Fri Oct 29 17:47:53 2010
Stack: openais
Current DC: node1.clusterbr.int - partition with quorum
Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ node2.clusterbr.int node1.clusterbr.int ]

 DBIP   (ocf::heartbeat:IPaddr2):       Started node2.clusterbr.int   

Note that the cluster status shows where the resource is running. Here it's running on node2, but could be on node1.


Configuring DRBD on Cluster

Adding DRBD resource on cluster:

crm configure primitive drbd_postgres ocf:linbit:drbd \ 
                    params drbd_resource="postgres" \   
                    op monitor interval="15s" 


Configure the DRBD primary and secondary node:

crm configure ms ms_drbd_postgres drbd_postgres \              
                    meta master-max="1" master-node-max="1" \  
                         clone-max="2" clone-node-max="1" \    
                         notify="true" 


Configure the DRBD mounting filesystem (and mountpoint):

crm configure primitive postgres_fs ocf:heartbeat:Filesystem \  
                    params device="/dev/drbd0" directory="/var/lib/pgsql" fstype="ext3" 

Configuring PostgreSQL on Cluster

Adding the postgresql resource on cluster:

crm configure primitive postgresql ocf:heartbeat:pgsql \  
  op monitor depth="0" timeout="30" interval="30"

Now, we need to group DBIP, postgresql and DRBD mounted filesystem. The name of the group will be "postgres":

crm configure group postgres postgres_fs DBIP postgresql

Fixing group postgres to run together with DRBD Primary node

crm configure colocation postgres_on_drbd inf: postgres ms_drbd_postgres:Master 

Configuring postgres to run after DRBD

crm configure order postgres_after_drbd inf: ms_drbd_postgres:promote postgres:start 

Showing cluster configuration:

[root@node1 ~]# crm configure show
node node1.clusterbr.int
node node2.clusterbr.int
primitive DBIP ocf:heartbeat:IPaddr2 \
        params ip="10.0.0.190" cidr_netmask="24" \
        op monitor interval="30s"
primitive drbd_postgres ocf:linbit:drbd \
        params drbd_resource="postgres" \
        op monitor interval="15s"
primitive postgres_fs ocf:heartbeat:Filesystem \
        params device="/dev/drbd0" directory="/var/lib/pgsql" fstype="ext3"
primitive postgresql ocf:heartbeat:pgsql \
        op monitor interval="30" timeout="30" depth="0" \
        meta target-role="Started"
group postgres postgres_fs DBIP postgresql \
        meta target-role="Started"
ms ms_drbd_postgres drbd_postgres \
        meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
colocation postgres_on_drbd inf: postgres ms_drbd_postgres:Master
order postgres_after_drbd inf: ms_drbd_postgres:promote postgres:start
property $id="cib-bootstrap-options" \
        dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"
[root@node1 ~]#   

Setting the Preferential Node

It's important to the pacemaker know where we prefer to run the services. To make node1 the preferential, use:

crm configure location master-prefer-node1 DBIP 50: node1.clusterbr.int

Note that the weight to prefer node1 is 50. Then if the service is running on node2, pacemaker will not change to node1 automatically, because we configured resource-stickiness to 100 (take a look up). Then, even the node1 has resettled from downtime, the cluster will keep the services on node2.

Showing status:

[root@node2 ~]# crm_mon -1
============
Last updated: Fri Oct 29 19:54:09 2010
Stack: openais
Current DC: node2.clusterbr.int - partition with quorum
Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ node2.clusterbr.int node1.clusterbr.int ]

 Master/Slave Set: ms_drbd_postgres
     Masters: [ node2.clusterbr.int ]
     Slaves: [ node1.clusterbr.int ]
 Resource Group: postgres
     postgres_fs        (ocf::heartbeat:Filesystem):    Started node2.clusterbr.int
     DBIP       (ocf::heartbeat:IPaddr2):               Started node2.clusterbr.int
     postgresql (ocf::heartbeat:pgsql):                 Started node2.clusterbr.int     

You may get some errors on the status, then you must to reboot both the nodes to corosync complete configuration. After reboots, you should connect to DBIP (10.0.0.190) on port TCP 5432 to postgres. To test the cluster, you can poweroff the node or stop corosync service on it.


Cluster managment

These commands are very helpfull to manage the cluster.

To migrate a resource to other node, do:

crm resource migrate postgres node1.clusterbr.int 

To remove the above migrate command, do:

crm resource unmigrate postgres 

To clean resource messages, do:

 crm resource cleanup postgres 

To stop postgresql service on cluster, do:

 crm resource stop postgresql 

To start postgresql service on cluster, do:

crm resource start postgresql 


8. Creating Webpage to show status

This configuration is very usefull to quickly check the cluster status. Must be runned on both nodes.

Start apache (if it's not running):

/etc/init.d/httpd start 
chkconfig --level 35 httpd on

Create a cluster directory (under the DocumentRoot):

mkdir /var/www/html/cluster/

To generate the html, do:

crm_mon --daemonize --as-html /var/www/html/cluster/index.html

Let's put on rc.local to automatically run on startup:

echo "crm_mon --daemonize --as-html /var/www/html/cluster/index.html" >> /etc/rc.d/rc.local

To access, points your browser to http://10.0.0.190/cluster


9. Installing phppgadmin to manage postgres

It's very simple. Remenber to do this on both nodes:

First, download it:

mkdir /download 
cd /download 
wget 'http://downloads.sourceforge.net/project/phppgadmin/phpPgAdmin%20%5Bbeta%5D/phpPgAdmin-5.0/phpPgAdmin-5.0-beta2.tar.bz2?r=http%3A%2F%2Fphppgadmin.sourceforge.net%2F%3Fpage%3Ddownload&ts=1288189530&use_mirror=ufpr'

Then, install:

tar -jxvf phpPgAdmin-5.0-beta2.tar.bz2 
mv phpPgAdmin-5.0-beta2 /var/www/html/cluster-pgadmin 
chown apache.apache -R /var/www/html/cluster-pgadmin

To access, points you browser to: http://10.0.0.190/cluster-pgadmin PS: Use admpgsql as user and the password that you configured to logon


10. Acessing from network

If you need to access postgres from LAN, don't forget to configure authentication on postgres:

Here, we'll set md5 authentication to network 10.0.0.0/24. In the node that postgresql in running, do:

echo "host  all   all   10.0.0.0/24   md5">> /var/lib/pgsql/data/pg_hba.conf

Then, restart postgres to reload configuration:

crm resource stop postgresql
crm resource start postgresql


11. Monitoring

Cluster monitoring is mandatory on production scenarios. To make this work with Zabbix, we suggests that you install Zabbix agent on every node. Then monitor on every node, these items::

  • Check Local Ping (10.0.0.191, 10.0.0.192 and 172.16.0.1, 172.16.0.2)
  • Check DBIP (Cluster IP) 10.0.0.190
  • Check Postgres TCP Port (5432) on DBIP 10.0.0.190
  • General checks, like disk use, memory, processor
  • Use the following (and very simple) script called monitor_drbd.sh (that returns 1, when everything is ok, and 0 when got problems.)

Here goes the monitor_drbd.sh to use with Zabbix:

#!/bin/bash

CHECK=`cat /proc/drbd | grep UpToDate/UpToDate | cut -d: -f5 | cut -c1-17`
STRING_OK="UpToDate/UpToDate"

# Comparando as duas.
if [ "$CHECK" == "$STRING_OK" ] ; then
	# Is ok, returning 1
	echo 1;
else
	# Not ok, returning 0
	echo 0;
fi   

References

Corosync: http://corosync.org/doku.php
DRBD User Guide http://www.drbd.org/users-guide/s-pacemaker-crm.html
Pacemaker and DRBD: http://www.drbd.org/users-guide/s-pacemaker-crm-drbd-backed-service.html
Clusters from scratch: http://www.clusterlabs.org/mwiki/images/5/56/Cluster_from_Scratch_-_Fedora_12.pdf