1. 下载软件包
heartbeat-2.0.8.tar.gz
libnet-1.1.2.1-2.1.i386.rpm
2.安装heartbeat
# groupadd haclient
# useradd -g haclient hacluster
# rpm -ivh libnet-1.1.2.1-2.1.i386.rpm
# tar zxvf heartbeat-2.0.8.tar.gz
# cd heartbeat-2.0.8
# ./ConfigureMe configure --prefix=/usr/local/heartbeat
# make
# make install
3. 配置heartbeat
# cd /usr/local/hearbeat/
# cp share/doc/heartbeat-2.0.8/haresources share/doc/heartbeat-2.0.8/ha.cf share/doc/heartbeat-2.0.8/authkeys /etc/ha.d/
# cd /etc/ha.d
# vim ha.cf 注释或修改以下内容
logfile /var/log/ha-log
keepalive 2
deadtime 30
warntime 10
initdead 120
ucast eth0 10.0.2.51
auto_failback on
watchdog /dev/watchdog
node node1.com
node node2.com
ping 10.0.2.1
respawn hacluster /usr/local/heartbeat/lib/heartbeat/ipfail
# vim haresource
node1.com IPaddr::10.0.2.62/8/eth0:0 Filesystem::10.0.2.48:/data::/var/lib/mysql::nfs mysqld
# vim authkeys
auth 3
3 md5 Hello!
# chmod 600 authkeys
4. 在从服务器上同样执行以上操作,在第三步时,把ha.cf haresouce和authkeys 三个配置文件scp到从服务器上,但是要改一个文件ha.cf的一处:ucast eth0 10.0.2.50
5. 先启动主服务器上的heartbeat服务再启动从服务器上的heartbeat服务
# service heartbeat start
heartbeat-2.0.8.tar.gz
libnet-1.1.2.1-2.1.i386.rpm
2.安装heartbeat
# groupadd haclient
# useradd -g haclient hacluster
# rpm -ivh libnet-1.1.2.1-2.1.i386.rpm
# tar zxvf heartbeat-2.0.8.tar.gz
# cd heartbeat-2.0.8
# ./ConfigureMe configure --prefix=/usr/local/heartbeat
# make
# make install
3. 配置heartbeat
# cd /usr/local/hearbeat/
# cp share/doc/heartbeat-2.0.8/haresources share/doc/heartbeat-2.0.8/ha.cf share/doc/heartbeat-2.0.8/authkeys /etc/ha.d/
# cd /etc/ha.d
# vim ha.cf 注释或修改以下内容
logfile /var/log/ha-log
keepalive 2
deadtime 30
warntime 10
initdead 120
ucast eth0 10.0.2.51
auto_failback on
watchdog /dev/watchdog
node node1.com
node node2.com
ping 10.0.2.1
respawn hacluster /usr/local/heartbeat/lib/heartbeat/ipfail
# vim haresource
node1.com IPaddr::10.0.2.62/8/eth0:0 Filesystem::10.0.2.48:/data::/var/lib/mysql::nfs mysqld
# vim authkeys
auth 3
3 md5 Hello!
# chmod 600 authkeys
4. 在从服务器上同样执行以上操作,在第三步时,把ha.cf haresouce和authkeys 三个配置文件scp到从服务器上,但是要改一个文件ha.cf的一处:ucast eth0 10.0.2.50
5. 先启动主服务器上的heartbeat服务再启动从服务器上的heartbeat服务
# service heartbeat start
0
6. 在主服务器查看资源是否正常启动
# ip ad sh 查看虚拟ip是否应用
# netstat -antp | grep 3306 检查mysqld是否启动
7. 测试heartbeat服务功能
(1)
假设dbserver1宕机
# iptables -A OUTPUT -p icmp -d 10.0.2.52 --icmp-type 0 -j DROP
# iptables -A INPUT -p icmp -s 10.0.2.52 --icmp-type 0 -j DROP
# tail -f /var/log/ha-log
heartbeat[3525]: 2009/03/21_11:21:25 WARN: node 10.0.2.52: is dead
heartbeat[3525]: 2009/03/21_11:21:25 info: Link 10.0.2.52:10.0.2.52 dead.
harc[3855]: 2009/03/21_11:21:25 info: Running /etc/ha.d/rc.d/status status
heartbeat[3525]: 2009/03/21_11:21:34 info: dbserver1.com wants to go standby [all]
heartbeat[3525]: 2009/03/21_11:21:34 info: standby: dbserver2.com can take our all resources
heartbeat[3871]: 2009/03/21_11:21:34 info: give up all HA resources (standby).
ResourceManager[3881]: 2009/03/21_11:21:35 info: Releasing resource group: dbserver1.com IPaddr::10.0.2.62/8/eth0:0 mysqld
ResourceManager[3881]: 2009/03/21_11:21:35 info: Running /etc/init.d/mysqld stop
ResourceManager[3881]: 2009/03/21_11:21:36 info: Running /etc/ha.d/resource.d/IPaddr 10.0.2.62/8/eth0:0 stop
IPaddr[3979]: 2009/03/21_11:21:37 INFO: /sbin/ifconfig eth0:0 10.0.2.62 down
IPaddr[3958]: 2009/03/21_11:21:37 INFO: Success
heartbeat[3871]: 2009/03/21_11:21:37 info: all HA resource release completed (standby).
heartbeat[3525]: 2009/03/21_11:21:37 info: Local standby process completed [all]
heartbeat[3525]: 2009/03/21_11:21:38 WARN: 1 lost packet(s) for [dbserver2.com] [367:369]
heartbeat[3525]: 2009/03/21_11:21:38 info: remote resource transition completed.
heartbeat[3525]: 2009/03/21_11:21:38 info: No pkts missing from dbserver2.com!
heartbeat[3525]: 2009/03/21_11:21:38 info: Other node completed standby takeover of all resources.
(2)假设dbserver1恢复正常
# iptables -F
# tail -f /var/log/ha-log
heartbeat[3525]: 2009/03/21_11:23:48 info: Link 10.0.2.52:10.0.2.52 up.
heartbeat[3525]: 2009/03/21_11:23:48 WARN: Late heartbeat: Node 10.0.2.52: interval 174040 ms
heartbeat[3525]: 2009/03/21_11:23:48 info: Status update for node 10.0.2.52: status ping
heartbeat[3525]: 2009/03/21_11:23:56 info: dbserver2.com wants to go standby [foreign]
heartbeat[3525]: 2009/03/21_11:23:58 info: standby: acquire [foreign] resources from dbserver2.com
heartbeat[4008]: 2009/03/21_11:23:58 info: acquire local HA resources (standby).
ResourceManager[4018]: 2009/03/21_11:23:58 info: Acquiring resource group: dbserver1.com IPaddr::10.0.2.62/8/eth0:0 mysqld
IPaddr[4042]: 2009/03/21_11:23:59 INFO: Resource is stopped
ResourceManager[4018]: 2009/03/21_11:23:59 info: Running /etc/ha.d/resource.d/IPaddr 10.0.2.62/8/eth0:0 start
IPaddr[4118]: 2009/03/21_11:23:59 INFO: Using calculated netmask for 10.0.2.62: 255.0.0.0
IPaddr[4118]: 2009/03/21_11:24:00 DEBUG: Using calculated broadcast for 10.0.2.62: 10.255.255.255
IPaddr[4118]: 2009/03/21_11:24:00 INFO: eval /sbin/ifconfig eth0:0 10.0.2.62 netmask 255.0.0.0 broadcast 10.255.255.255
IPaddr[4118]: 2009/03/21_11:24:00 DEBUG: Sending Gratuitous Arp for 10.0.2.62 on eth0:0 [eth0]
IPaddr[4097]: 2009/03/21_11:24:00 INFO: Success
ResourceManager[4018]: 2009/03/21_11:24:00 info: Running /etc/init.d/mysqld start
heartbeat[4008]: 2009/03/21_11:24:02 info: local HA resource acquisition completed (standby).
heartbeat[3525]: 2009/03/21_11:24:02 info: Standby resource acquisition done [foreign].
heartbeat[3525]: 2009/03/21_11:24:02 info: remote resource transition completed.
(3) 假设dbserver1 heartbeat服务出现问题
# kill -9 5049 //这时heartbeat服务的pid
# tail -f /var/log/ha-log //先查看dbserver1的日志
heartbeat[5051]: 2009/03/21_11:35:15 CRIT: Emergency Shutdown: Master Control process died.
heartbeat[5051]: 2009/03/21_11:35:15 CRIT: Killing pid 5049 with SIGTERM
heartbeat[5051]: 2009/03/21_11:35:15 CRIT: Killing pid 5052 with SIGTERM
heartbeat[5051]: 2009/03/21_11:35:15 CRIT: Killing pid 5053 with SIGTERM
heartbeat[5051]: 2009/03/21_11:35:15 CRIT: Killing pid 5054 with SIGTERM
heartbeat[5051]: 2009/03/21_11:35:15 CRIT: Killing pid 5055 with SIGTERM
heartbeat[5051]: 2009/03/21_11:35:15 CRIT: Emergency Shutdown(MCP dead): Killing ourselves.
# tail -f /var/log/ha-log //再查看dbserver2的日志
heartbeat[5317]: 2009/03/21_11:38:12 WARN: node dbserver1.com: is dead
heartbeat[5317]: 2009/03/21_11:38:12 WARN: No STONITH device configured.
heartbeat[5317]: 2009/03/21_11:38:12 WARN: Shared disks are not protected.
heartbeat[5317]: 2009/03/21_11:38:12 info: Resources being acquired from dbserver1.com.
heartbeat[5317]: 2009/03/21_11:38:12 info: Link dbserver1.com:eth0 dead.
harc[5353]: 2009/03/21_11:38:12 info: Running /etc/ha.d/rc.d/status status
heartbeat[5354]: 2009/03/21_11:38:12 info: No local resources [/usr/local/heartbeat/lib/heartbeat/ResourceManager listkeys dbserver2.com] to acquire.
mach_down[5373]: 2009/03/21_11:38:12 info: Taking over resource group IPaddr::10.0.2.62/8/eth0:0
ResourceManager[5393]: 2009/03/21_11:38:12 info: Acquiring resource group: dbserver1.com IPaddr::10.0.2.62/8/eth0:0 mysqld
IPaddr[5417]: 2009/03/21_11:38:13 INFO: Resource is stopped
ResourceManager[5393]: 2009/03/21_11:38:13 info: Running /etc/ha.d/resource.d/IPaddr 10.0.2.62/8/eth0:0 start
IPaddr[5493]: 2009/03/21_11:38:13 INFO: Using calculated netmask for 10.0.2.62: 255.0.0.0
IPaddr[5493]: 2009/03/21_11:38:13 DEBUG: Using calculated broadcast for 10.0.2.62: 10.255.255.255
IPaddr[5493]: 2009/03/21_11:38:14 INFO: eval /sbin/ifconfig eth0:0 10.0.2.62 netmask 255.0.0.0 broadcast 10.255.255.255
IPaddr[5493]: 2009/03/21_11:38:14 DEBUG: Sending Gratuitous Arp for 10.0.2.62 on eth0:0 [eth0]
IPaddr[5472]: 2009/03/21_11:38:14 INFO: Success
ResourceManager[5393]: 2009/03/21_11:38:14 info: Running /etc/init.d/mysqld start
mach_down[5373]: 2009/03/21_11:38:16 info: /usr/local/heartbeat/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[5373]: 2009/03/21_11:38:16 info: mach_down takeover complete for node dbserver1.com.
heartbeat[5317]: 2009/03/21_11:38:16 info: mach_down takeover complete.
heartbeat[5317]: 2009/03/21_11:40:49 info: Heartbeat restart on node dbserver1.com
heartbeat[5317]: 2009/03/21_11:40:49 info: Link dbserver1.com:eth0 up.
heartbeat[5317]: 2009/03/21_11:40:49 info: Status update for node dbserver1.com: status init
heartbeat[5317]: 2009/03/21_11:40:49 info: Status update for node dbserver1.com: status up
harc[5687]: 2009/03/21_11:40:49 info: Running /etc/ha.d/rc.d/status status
harc[5697]: 2009/03/21_11:40:49 info: Running /etc/ha.d/rc.d/status status
heartbeat[5317]: 2009/03/21_11:40:50 info: Status update for node dbserver1.com: status active
heartbeat[5317]: 2009/03/21_11:40:50 info: all clients are now paused
harc[5707]: 2009/03/21_11:40:50 info: Running /etc/ha.d/rc.d/status status
heartbeat[5317]: 2009/03/21_11:40:50 info: remote resource transition completed.
heartbeat[5317]: 2009/03/21_11:40:50 info: dbserver2.com wants to go standby [foreign]
heartbeat[5317]: 2009/03/21_11:40:51 info: standby: dbserver1.com can take our foreign resources
heartbeat[5717]: 2009/03/21_11:40:51 info: give up foreign HA resources (standby).
ResourceManager[5727]: 2009/03/21_11:40:51 info: Releasing resource group: dbserver1.com IPaddr::10.0.2.62/8/eth0:0 mysqld
ResourceManager[5727]: 2009/03/21_11:40:51 info: Running /etc/init.d/mysqld stop
ResourceManager[5727]: 2009/03/21_11:40:53 info: Running /etc/ha.d/resource.d/IPaddr 10.0.2.62/8/eth0:0 stop
IPaddr[5825]: 2009/03/21_11:40:53 INFO: /sbin/ifconfig eth0:0 10.0.2.62 down
IPaddr[5804]: 2009/03/21_11:40:53 INFO: Success
heartbeat[5717]: 2009/03/21_11:40:53 info: foreign HA resource release completed (standby).
heartbeat[5317]: 2009/03/21_11:40:53 info: Local standby process completed [foreign].
heartbeat[5317]: 2009/03/21_11:40:54 info: all clients are now resumed
heartbeat[5317]: 2009/03/21_11:40:54 WARN: 1 lost packet(s) for [dbserver1.com] [14:16]
heartbeat[5317]: 2009/03/21_11:40:54 info: remote resource transition completed.
heartbeat[5317]: 2009/03/21_11:40:54 info: No pkts missing from dbserver1.com!
heartbeat[5317]: 2009/03/21_11:40:54 info: Other node completed standby takeover of foreign resources.
heartbeat[5317]: 2009/03/21_11:41:01 info: dbserver1.com wants to go standby [foreign]
heartbeat[5317]: 2009/03/21_11:41:02 info: standby: acquire [foreign] resources from dbserver1.com
heartbeat[5850]: 2009/03/21_11:41:02 info: acquire local HA resources (standby).
heartbeat[5850]: 2009/03/21_11:41:02 info: local HA resource acquisition completed (standby).
heartbeat[5317]: 2009/03/21_11:41:02 info: Standby resource acquisition done [foreign].
heartbeat[5317]: 2009/03/21_11:41:02 info: remote resource transition completed.
从dbserver2的日志上可以看出,在资源切换到dbserver2仅仅2分钟,dbserver1上heartbeat服务又启动了,这时ipfall在启作用,这个进程重新启动了dbserver1,如果dbserver1的heartbeat服务是开机启动的话,那么当dbserver1启动完毕后,资源就会全部切换到dbserver1上。所以我们必须要把heartbeat服务设置为开机启动。
# chkconfig --level 35 heartbeat on 两台机器都要做
# ip ad sh 查看虚拟ip是否应用
# netstat -antp | grep 3306 检查mysqld是否启动
7. 测试heartbeat服务功能
(1)
假设dbserver1宕机
# iptables -A OUTPUT -p icmp -d 10.0.2.52 --icmp-type 0 -j DROP
# iptables -A INPUT -p icmp -s 10.0.2.52 --icmp-type 0 -j DROP
# tail -f /var/log/ha-log
heartbeat[3525]: 2009/03/21_11:21:25 WARN: node 10.0.2.52: is dead
heartbeat[3525]: 2009/03/21_11:21:25 info: Link 10.0.2.52:10.0.2.52 dead.
harc[3855]: 2009/03/21_11:21:25 info: Running /etc/ha.d/rc.d/status status
heartbeat[3525]: 2009/03/21_11:21:34 info: dbserver1.com wants to go standby [all]
heartbeat[3525]: 2009/03/21_11:21:34 info: standby: dbserver2.com can take our all resources
heartbeat[3871]: 2009/03/21_11:21:34 info: give up all HA resources (standby).
ResourceManager[3881]: 2009/03/21_11:21:35 info: Releasing resource group: dbserver1.com IPaddr::10.0.2.62/8/eth0:0 mysqld
ResourceManager[3881]: 2009/03/21_11:21:35 info: Running /etc/init.d/mysqld stop
ResourceManager[3881]: 2009/03/21_11:21:36 info: Running /etc/ha.d/resource.d/IPaddr 10.0.2.62/8/eth0:0 stop
IPaddr[3979]: 2009/03/21_11:21:37 INFO: /sbin/ifconfig eth0:0 10.0.2.62 down
IPaddr[3958]: 2009/03/21_11:21:37 INFO: Success
heartbeat[3871]: 2009/03/21_11:21:37 info: all HA resource release completed (standby).
heartbeat[3525]: 2009/03/21_11:21:37 info: Local standby process completed [all]
heartbeat[3525]: 2009/03/21_11:21:38 WARN: 1 lost packet(s) for [dbserver2.com] [367:369]
heartbeat[3525]: 2009/03/21_11:21:38 info: remote resource transition completed.
heartbeat[3525]: 2009/03/21_11:21:38 info: No pkts missing from dbserver2.com!
heartbeat[3525]: 2009/03/21_11:21:38 info: Other node completed standby takeover of all resources.
(2)假设dbserver1恢复正常
# iptables -F
# tail -f /var/log/ha-log
heartbeat[3525]: 2009/03/21_11:23:48 info: Link 10.0.2.52:10.0.2.52 up.
heartbeat[3525]: 2009/03/21_11:23:48 WARN: Late heartbeat: Node 10.0.2.52: interval 174040 ms
heartbeat[3525]: 2009/03/21_11:23:48 info: Status update for node 10.0.2.52: status ping
heartbeat[3525]: 2009/03/21_11:23:56 info: dbserver2.com wants to go standby [foreign]
heartbeat[3525]: 2009/03/21_11:23:58 info: standby: acquire [foreign] resources from dbserver2.com
heartbeat[4008]: 2009/03/21_11:23:58 info: acquire local HA resources (standby).
ResourceManager[4018]: 2009/03/21_11:23:58 info: Acquiring resource group: dbserver1.com IPaddr::10.0.2.62/8/eth0:0 mysqld
IPaddr[4042]: 2009/03/21_11:23:59 INFO: Resource is stopped
ResourceManager[4018]: 2009/03/21_11:23:59 info: Running /etc/ha.d/resource.d/IPaddr 10.0.2.62/8/eth0:0 start
IPaddr[4118]: 2009/03/21_11:23:59 INFO: Using calculated netmask for 10.0.2.62: 255.0.0.0
IPaddr[4118]: 2009/03/21_11:24:00 DEBUG: Using calculated broadcast for 10.0.2.62: 10.255.255.255
IPaddr[4118]: 2009/03/21_11:24:00 INFO: eval /sbin/ifconfig eth0:0 10.0.2.62 netmask 255.0.0.0 broadcast 10.255.255.255
IPaddr[4118]: 2009/03/21_11:24:00 DEBUG: Sending Gratuitous Arp for 10.0.2.62 on eth0:0 [eth0]
IPaddr[4097]: 2009/03/21_11:24:00 INFO: Success
ResourceManager[4018]: 2009/03/21_11:24:00 info: Running /etc/init.d/mysqld start
heartbeat[4008]: 2009/03/21_11:24:02 info: local HA resource acquisition completed (standby).
heartbeat[3525]: 2009/03/21_11:24:02 info: Standby resource acquisition done [foreign].
heartbeat[3525]: 2009/03/21_11:24:02 info: remote resource transition completed.
(3) 假设dbserver1 heartbeat服务出现问题
# kill -9 5049 //这时heartbeat服务的pid
# tail -f /var/log/ha-log //先查看dbserver1的日志
heartbeat[5051]: 2009/03/21_11:35:15 CRIT: Emergency Shutdown: Master Control process died.
heartbeat[5051]: 2009/03/21_11:35:15 CRIT: Killing pid 5049 with SIGTERM
heartbeat[5051]: 2009/03/21_11:35:15 CRIT: Killing pid 5052 with SIGTERM
heartbeat[5051]: 2009/03/21_11:35:15 CRIT: Killing pid 5053 with SIGTERM
heartbeat[5051]: 2009/03/21_11:35:15 CRIT: Killing pid 5054 with SIGTERM
heartbeat[5051]: 2009/03/21_11:35:15 CRIT: Killing pid 5055 with SIGTERM
heartbeat[5051]: 2009/03/21_11:35:15 CRIT: Emergency Shutdown(MCP dead): Killing ourselves.
# tail -f /var/log/ha-log //再查看dbserver2的日志
heartbeat[5317]: 2009/03/21_11:38:12 WARN: node dbserver1.com: is dead
heartbeat[5317]: 2009/03/21_11:38:12 WARN: No STONITH device configured.
heartbeat[5317]: 2009/03/21_11:38:12 WARN: Shared disks are not protected.
heartbeat[5317]: 2009/03/21_11:38:12 info: Resources being acquired from dbserver1.com.
heartbeat[5317]: 2009/03/21_11:38:12 info: Link dbserver1.com:eth0 dead.
harc[5353]: 2009/03/21_11:38:12 info: Running /etc/ha.d/rc.d/status status
heartbeat[5354]: 2009/03/21_11:38:12 info: No local resources [/usr/local/heartbeat/lib/heartbeat/ResourceManager listkeys dbserver2.com] to acquire.
mach_down[5373]: 2009/03/21_11:38:12 info: Taking over resource group IPaddr::10.0.2.62/8/eth0:0
ResourceManager[5393]: 2009/03/21_11:38:12 info: Acquiring resource group: dbserver1.com IPaddr::10.0.2.62/8/eth0:0 mysqld
IPaddr[5417]: 2009/03/21_11:38:13 INFO: Resource is stopped
ResourceManager[5393]: 2009/03/21_11:38:13 info: Running /etc/ha.d/resource.d/IPaddr 10.0.2.62/8/eth0:0 start
IPaddr[5493]: 2009/03/21_11:38:13 INFO: Using calculated netmask for 10.0.2.62: 255.0.0.0
IPaddr[5493]: 2009/03/21_11:38:13 DEBUG: Using calculated broadcast for 10.0.2.62: 10.255.255.255
IPaddr[5493]: 2009/03/21_11:38:14 INFO: eval /sbin/ifconfig eth0:0 10.0.2.62 netmask 255.0.0.0 broadcast 10.255.255.255
IPaddr[5493]: 2009/03/21_11:38:14 DEBUG: Sending Gratuitous Arp for 10.0.2.62 on eth0:0 [eth0]
IPaddr[5472]: 2009/03/21_11:38:14 INFO: Success
ResourceManager[5393]: 2009/03/21_11:38:14 info: Running /etc/init.d/mysqld start
mach_down[5373]: 2009/03/21_11:38:16 info: /usr/local/heartbeat/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[5373]: 2009/03/21_11:38:16 info: mach_down takeover complete for node dbserver1.com.
heartbeat[5317]: 2009/03/21_11:38:16 info: mach_down takeover complete.
heartbeat[5317]: 2009/03/21_11:40:49 info: Heartbeat restart on node dbserver1.com
heartbeat[5317]: 2009/03/21_11:40:49 info: Link dbserver1.com:eth0 up.
heartbeat[5317]: 2009/03/21_11:40:49 info: Status update for node dbserver1.com: status init
heartbeat[5317]: 2009/03/21_11:40:49 info: Status update for node dbserver1.com: status up
harc[5687]: 2009/03/21_11:40:49 info: Running /etc/ha.d/rc.d/status status
harc[5697]: 2009/03/21_11:40:49 info: Running /etc/ha.d/rc.d/status status
heartbeat[5317]: 2009/03/21_11:40:50 info: Status update for node dbserver1.com: status active
heartbeat[5317]: 2009/03/21_11:40:50 info: all clients are now paused
harc[5707]: 2009/03/21_11:40:50 info: Running /etc/ha.d/rc.d/status status
heartbeat[5317]: 2009/03/21_11:40:50 info: remote resource transition completed.
heartbeat[5317]: 2009/03/21_11:40:50 info: dbserver2.com wants to go standby [foreign]
heartbeat[5317]: 2009/03/21_11:40:51 info: standby: dbserver1.com can take our foreign resources
heartbeat[5717]: 2009/03/21_11:40:51 info: give up foreign HA resources (standby).
ResourceManager[5727]: 2009/03/21_11:40:51 info: Releasing resource group: dbserver1.com IPaddr::10.0.2.62/8/eth0:0 mysqld
ResourceManager[5727]: 2009/03/21_11:40:51 info: Running /etc/init.d/mysqld stop
ResourceManager[5727]: 2009/03/21_11:40:53 info: Running /etc/ha.d/resource.d/IPaddr 10.0.2.62/8/eth0:0 stop
IPaddr[5825]: 2009/03/21_11:40:53 INFO: /sbin/ifconfig eth0:0 10.0.2.62 down
IPaddr[5804]: 2009/03/21_11:40:53 INFO: Success
heartbeat[5717]: 2009/03/21_11:40:53 info: foreign HA resource release completed (standby).
heartbeat[5317]: 2009/03/21_11:40:53 info: Local standby process completed [foreign].
heartbeat[5317]: 2009/03/21_11:40:54 info: all clients are now resumed
heartbeat[5317]: 2009/03/21_11:40:54 WARN: 1 lost packet(s) for [dbserver1.com] [14:16]
heartbeat[5317]: 2009/03/21_11:40:54 info: remote resource transition completed.
heartbeat[5317]: 2009/03/21_11:40:54 info: No pkts missing from dbserver1.com!
heartbeat[5317]: 2009/03/21_11:40:54 info: Other node completed standby takeover of foreign resources.
heartbeat[5317]: 2009/03/21_11:41:01 info: dbserver1.com wants to go standby [foreign]
heartbeat[5317]: 2009/03/21_11:41:02 info: standby: acquire [foreign] resources from dbserver1.com
heartbeat[5850]: 2009/03/21_11:41:02 info: acquire local HA resources (standby).
heartbeat[5850]: 2009/03/21_11:41:02 info: local HA resource acquisition completed (standby).
heartbeat[5317]: 2009/03/21_11:41:02 info: Standby resource acquisition done [foreign].
heartbeat[5317]: 2009/03/21_11:41:02 info: remote resource transition completed.
从dbserver2的日志上可以看出,在资源切换到dbserver2仅仅2分钟,dbserver1上heartbeat服务又启动了,这时ipfall在启作用,这个进程重新启动了dbserver1,如果dbserver1的heartbeat服务是开机启动的话,那么当dbserver1启动完毕后,资源就会全部切换到dbserver1上。所以我们必须要把heartbeat服务设置为开机启动。
# chkconfig --level 35 heartbeat on 两台机器都要做
编辑回复