AS you know, in 11gR2, oracle uses UDP protocol for heartbeats between the nodes.
In this post, I present a node eviction scenario when UDP communication is blocked between the nodes and you see that depends on where and how UDP is blocked, a different situation could occur.
The test is done on two node RAC in 11.2.0.3PSU3 version on Linux
Scenario 1: When UDP communication is blocked on the second node
In this scenario, outgoing UDP for ocssd process on node2 is blocked.
To do so, we find out UDP port on which ocssd is listening and will disable any outgoing trafficnetstat -a --inet |grep -i udp | grep -i racnode2 udp 0 0 racnode2-priv:14081 *:* udp 0 0 racnode2-priv:52358 *:* udp 0 0 racnode2-priv:52242 *:* udp 0 0 racnode2-priv:42517 *:* --> ocssd udp 0 0 racnode2-priv:31126 *:* udp 0 0 racnode2-priv:60741 *:* [root@racnode2 ~]# lsof -i :42517 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME ocssd.bin 3005 grid 55u IPv4 22340 UDP racnode2-priv:42517
42517 is the port which ocssd on racnode2 sends its heartbeat.
To break heartbeat communication between node2 and node1, any outgoing traffic on racnode2 for ocssd process (port 42517) is blocked with this command
iptables -A OUTPUT -s 192.168.2.152 -p udp --sport 42517 -j DROP
alter log on node1 shows that node2 is evicted (bootless eviction) and then it is reconfigured and is joined the cluster.
[cssd(3015)]CRS-1612:Network communication with node racnode2 (2) missing for 50% of timeout interval. Removal of this node from cluster in 14.840 seconds 2012-12-31 10:50:54.213 [cssd(3015)]CRS-1611:Network communication with node racnode2 (2) missing for 75% of timeout interval. Removal of this node from cluster in 6.820 seconds 2012-12-31 10:50:58.232 [cssd(3015)]CRS-1610:Network communication with node racnode2 (2) missing for 90% of timeout interval. Removal of this node from cluster in 2.810 seconds 2012-12-31 10:51:01.056 [cssd(3015)]CRS-1607:Node racnode2 is being evicted in cluster incarnation 249572820; details at (:CSSNM00007:) in /u01/app/11.2.0/grid/log/racnode1/cssd/ocssd.log. 2012-12-31 10:51:02.584 [cssd(3015)]CRS-1625:Node racnode2, number 2, was manually shut down 2012-12-31 10:51:02.590 [cssd(3015)]CRS-1601:CSSD Reconfiguration complete. Active nodes are racnode1 . 2012-12-31 10:51:02.630 [crsd(3393)]CRS-5504:Node down event reported for node 'racnode2'. 2012-12-31 10:51:05.827 [crsd(3393)]CRS-2773:Server 'racnode2' has been removed from pool 'Generic'. 2012-12-31 10:51:05.829 [crsd(3393)]CRS-2773:Server 'racnode2' has been removed from pool 'ora.orcl'. 2012-12-31 10:51:37.987 [cssd(3015)]CRS-1601:CSSD Reconfiguration complete. Active nodes are racnode1 racnode2 . 2012-12-31 10:52:13.720 [crsd(3393)]CRS-2772:Server 'racnode2' has been assigned to pool 'Generic'. 2012-12-31 10:52:13.720 [crsd(3393)]CRS-2772:Server 'racnode2' has been assigned to pool 'Generic'. 2012-12-31 10:52:13.720 [crsd(3393)]CRS-2772:Server 'racnode2' has been assigned to pool 'ora.orcl'.
ocssed on racnode2 has more details, I highlighted couple key lines
2012-12-31 10:51:01.129: [ CSSD][3019058064]################################### 2012-12-31 10:51:01.129: [ CSSD][3019058064]clssscExit: CSSD aborting from thread clssnmvKillBlockThread 2012-12-31 10:51:01.129: [ CSSD][3019058064]################################### . . 2012-12-31 10:51:02.559: [ CSSD][3029027728]clssgmClientShutdown: total iocapables 0 2012-12-31 10:51:02.559: [ CSSD][3029027728]clssgmClientShutdown: graceful shutdown completed. 2012-12-31 10:51:02.559: [ CSSD][3029027728]clssnmSendManualShut: Notifying all nodes that this node has been manually shut down . . 2012-12-31 10:51:25.352: [ CSSD][3040868032]clssscmain: Starting CSS daemon, version 11.2.0.3.0, in (clustered) mode with uniqueness value 1356979885 2012-12-31 10:51:25.353: [ CSSD][3040868032]clssscmain: Environment is production . . 2012-12-31 10:51:26.167: [GIPCHTHR][3024477072] gipchaWorkerCreateInterface: created local interface for node 'racnode2', haName 'CSS_racnode-cluster', inf 'udp://192.168.2.152:29788'
A key note here is that udp is reconfigured to be run on different port and after that node2 is able to join the cluster and starts up all its resources.
The following netstat also confirms that ocssd.bin listens on the new port
[root@racnode2 ~]# netstat -a --inet |grep -i udp | grep -i racnode2 udp 0 0 racnode2-priv:31126 *:* udp 0 0 racnode2-priv:35489 *:* udp 0 0 racnode2-priv:38321 *:* udp 0 0 racnode2-priv:60741 *:* udp 0 0 racnode2-priv:10321 *:* udp 0 0 racnode2-priv:29788 *:* --> new port is created..... [root@racnode2 working]# lsof -i :29788 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME ocssd.bin 5919 grid 52u IPv4 764918 UDP racnode2-priv:29788
If I sum up, the following sequence of events occurs :
- udp communication for heartbeat is blocked (outgoing udp on ocssd port)
- Node1 evicts Node2
- Node2 is able to stop all IO capable resources and as the result, no need to boot the node (11g feature).
- Node2 starts CSSD and reconfigures UDP port
- Node2 is able to join the cluster
This sounds perfect as node2 is able to recover by itself. It looks like transparent and straight forward recovery.
Let see how this failure is recovered if UDP hiccups occur on node1 (master node in two node RAC)
Scenario 2: When UDP communication is blocked on the first node
To follow the same step, UDP port for hearbeat is found and it is blocked as it is shown in belowbash-3.2$ netstat -a --inet |grep -i udp | grep -i racnode1 udp 0 0 racnode1-priv:36613 *:* udp 0 0 racnode1-priv:36892 *:* udp 0 0 racnode1-priv:26055 *:* udp 0 0 racnode1-priv:13167 *:* udp 0 0 racnode1-priv:17914 *:* udp 0 0 racnode1-priv:51067 *:* [root@racnode1 ~]# lsof -i :36613 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME ocssd.bin 3010 grid 55u IPv4 19676 UDP racnode1-priv:36613
To block heartbeat, all outgoing traffic on port 36613 is blocked
iptables -A OUTPUT -s 192.168.2.151 -p udp --sport 36613 -j DROP
Based on scenario 1, I expected to see the same sequence of events. In other words, I expected to see node2 is evicted and is reconfigured and is rejoined the cluster.
However, in this case, it is seen that node2 is evicted and then as it is shown in below cssd is hung in start up and joining the cluster.
[root@racnode1 ~]# crsctl check cluster -all
**************************************************************
racnode1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
racnode2:
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
**************************************************************
[root@racnode2 ~]# crsctl stat res -init -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE Abnormal Termination
ora.cluster_interconnect.haip
1 ONLINE OFFLINE
ora.crf
1 ONLINE ONLINE racnode2
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE OFFLINE STARTING
ora.cssdmonitor
1 ONLINE ONLINE racnode2
ora.ctssd
1 ONLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.drivers.acfs
1 ONLINE ONLINE racnode2
ora.evmd
1 ONLINE OFFLINE
ora.gipcd
1 ONLINE ONLINE racnode2
ora.gpnpd
1 ONLINE ONLINE racnode2
ora.mdnsd
1 ONLINE ONLINE racnode2
Even,unblocking the same port with dropping the rule from iptables does not help and still CSS on node2 is not able to join the cluster.iptables -L iptables -D OUTPUT -s 192.168.2.151 -p udp --sport 36613 -j DROP [root@racnode1 ~]# iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination
After reviewing all logs (quite length log, so I avoid to copy it here!), It is seen that ocssd in node2 complains about network heartbeat and no reconfig attempt is done, also in node1, after blocking UDP, the interface was disabled and no try is done to setup communication on differently.
As I mentioned earlier, Although UDP port is unblocked, still the following error is reported on node2 and node1 repeatedly.
Node 2 ========== [ CSSD][3013077904]clssnmvDHBValidateNcopy: node 1, racnode1, has a disk HB, but no network HB, DHB has rcfg 249572810, wrtcnt, 181183, LATS 1471304, lastSeqNo 181182, uniqueness 1356788464, timestamp 1356790643/1482244 Node1 ========= [GIPCHALO][3023862672] gipchaLowerProcessNode: no valid interfaces found to node for 25790 ms, node 0xa062a88 { host 'racnode2', haName 'CSS_racnode-cluster', srcLuid be28e076-9f3aafb1, dstLuid 61a1f895-ba260945 numInf 0, contigSeq 2639, lastAck 2626, lastValidAck 2638, sendSeq [2627 : 2683], createTime 4294328280, sentRegister 1, localMonitor 1, flags 0x2408 }
It turned out that the issue is reported as Bug 14281269 : "NODE CAN'T REJOIN THE CLUSTER AFTER A TEMPORARY INTERCONNECT FAILURE - PROBLEM:after an interconnect failure on the first node the second node restarts the clusterware (rebootless restart) as expected, but can't join the cluster again till the interconnect interface of node1
is not shutdown/startup manually "
At the time of posting this, there is no patch available and the suggested workaround is to bounce interconnect interface.
In my test, even bouncing node2 (evicted node) did not help and I ended up to kill gipc daemon on node1 (master node/surviving node) and it did help and whole cluster recovered and node2 was able to join the cluster.
[root@racnode1 working]# ps -ef |grep -i gipc grid 2961 1 0 05:40 ? 00:00:16 /u01/app/11.2.0/grid/bin/gipcd.bin root 7709 4792 0 06:44 pts/1 00:00:00 grep -i gipc [root@racnode1 working]# kill -9 2961 [root@racnode1 working]# ps -ef |grep -i gipc grid 7717 1 15 06:44 ? 00:00:00 /u01/app/11.2.0/grid/bin/gipcd.bin root 7755 4792 0 06:44 pts/1 00:00:00 grep -i gipc [/u01/app/11.2.0/grid/bin/oraagent.bin(3528)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:5} in /u01/app/11.2.0/grid/log/racnode1/agent/crsd/oraagent_grid/oraagent_grid.log. 2012-12-29 06:44:56.972 [/u01/app/11.2.0/grid/bin/orarootagent.bin(3535)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:2:23} in /u01/app/11.2.0/grid/log/racnode1/agent/crsd/orarootagent_root/orarootagent_root.log. 2012-12-29 06:44:56.974 [/u01/app/11.2.0/grid/bin/oraagent.bin(3741)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:5:63} in /u01/app/11.2.0/grid/log/racnode1/agent/crsd/oraagent_oracle/oraagent_oracle.log. 2012-12-29 06:44:57.098 [ohasd(2414)]CRS-2765:Resource 'ora.ctssd' has failed on server 'racnode1'. 2012-12-29 06:44:59.141 [ctssd(7732)]CRS-2401:The Cluster Time Synchronization Service started on host racnode1. 2012-12-29 06:44:59.141 [ctssd(7732)]CRS-2407:The new Cluster Time Synchronization Service reference node is host racnode1. 2012-12-29 06:45:01.164 [cssd(3010)]CRS-1601:CSSD Reconfiguration complete. Active nodes are racnode1 racnode2 . 2012-12-29 06:45:02.363 [crsd(7759)]CRS-1012:The OCR service started on node racnode1. 2012-12-29 06:45:03.155 [evmd(7762)]CRS-1401:EVMD started on node racnode1. 2012-12-29 06:45:05.147 [crsd(7759)]CRS-1201:CRSD started on node racnode1. 2012-12-29 06:45:38.798 [crsd(7759)]CRS-2772:Server 'racnode2' has been assigned to pool 'Generic'. 2012-12-29 06:45:38.800 [crsd(7759)]CRS-2772:Server 'racnode2' has been assigned to pool 'ora.orcl'. ===== alert for node2 ========= 2012-12-29 06:39:38.132 [cssd(7700)]CRS-1605:CSSD voting file is online: /dev/sda1; details in /u01/app/11.2.0/grid/log/racnode2/cssd/ocssd.log. 2012-12-29 06:45:01.165 [cssd(7700)]CRS-1601:CSSD Reconfiguration complete. Active nodes are racnode1 racnode2 . 2012-12-29 06:45:03.641 [ctssd(8061)]CRS-2401:The Cluster Time Synchronization Service started on host racnode2. 2012-12-29 06:45:03.641 [ctssd(8061)]CRS-2407:The new Cluster Time Synchronization Service reference node is host racnode1. 2012-12-29 06:45:05.257 [ohasd(2405)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE 2012-12-29 06:45:16.836 [ctssd(8061)]CRS-2408:The clock on host racnode2 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time. 2012-12-29 06:45:25.475 [crsd(8199)]CRS-1012:The OCR service started on node racnode2. 2012-12-29 06:45:25.541 [evmd(8079)]CRS-1401:EVMD started on node racnode2. 2012-12-29 06:45:27.331 [crsd(8199)]CRS-1201:CRSD started on node racnode2. 2012-12-29 06:45:35.642 [/u01/app/11.2.0/grid/bin/oraagent.bin(8321)]CRS-5016:Process "/u01/app/11.2.0/grid/opmn/bin/onsctli" spawned by agent "/u01/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/racnode2/agent/crsd/oraagent_grid/oraagent_grid.log" 2012-12-29 06:45:36.181 [/u01/app/11.2.0/grid/bin/oraagent.bin(8347)]CRS-5011:Check of resource "orcl" failed: details at "(:CLSN00007:)" in "/u01/app/11.2.0/grid/log/racnode2/agent/crsd/oraagent_oracle/oraagent_oracle.log" 2012-12-29 06:45:37.301 [/u01/app/11.2.0/grid/bin/oraagent.bin(8321)]CRS-5016:Process "/u01/app/11.2.0/grid/bin/lsnrctl" spawned by agent "/u01/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/racnode2/agent/crsd/oraagent_grid/oraagent_grid.log"
To conclude, in 2 node RAC :
- Network hiccups on heartbeat port on node2 is recovered automatically.
- Network hiccups on heartbeat port on node1 requires manual intervention due to bug 14281269
- Due to several reported bug, it is recommended to be on 11203 PSU3 at least.check out metalink note for other bugs: List of gipc defects that prevent GI from starting/joining after network hiccups [ID 1488378.1])