Fiz uma configuração de laboratório e postarei algumas das configurações e registros para que você possa comparar.
Eu tenho o servidor Informix ifx_a
como primário em machine1
e o servidor Informix ifx_b
como o HDR secundário em machine2
.
Nos servidores Informix ifx_a
, tenho os seguintes parâmetros de replicação.
HA_ALIAS
é igual ao DBSERVERNAME
, para mantê-lo simples). Os parâmetros ifx_b
do servidor Informix são os mesmos, exceto HA_ALIAS
e DBSERVERNAME
.
DRAUTO 3
DRINTERVAL 0
HDR_TXN_SCOPE ASYNC
DRTIMEOUT 30
HA_ALIAS ifx_a
HA_FOC_ORDER SDS,HDR,RSS
DRLOSTFOUND $INFORMIXDIR/etc/dr.lostfound
DRIDXAUTO 0
LOG_INDEX_BUILDS 1
SDS_ENABLE
SDS_TIMEOUT 20
SDS_TEMPDBS
SDS_PAGING
SDS_LOGCHECK 10
SDS_ALTERNATE NONE
SDS_FLOW_CONTROL 0
UPDATABLE_SECONDARY 0
FAILOVER_CALLBACK
FAILOVER_TX_TIMEOUT 0
TEMPTAB_NOLOG 1
DELAY_APPLY 0
STOP_APPLY 0
LOG_STAGING_DIR $INFORMIXDIR/tmp
RSS_FLOW_CONTROL 0
SMX_NUMPIPES 1
ENABLE_SNAPSHOT_COPY 0
SMX_COMPRESS 0
SMX_PING_INTERVAL 10
SMX_PING_RETRY 6
CLUSTER_TXN_SCOPE SERVER
O arquivo sqlhosts
para os dois servidores Informix é:
ifx_a onsoctcp machine1.local 15010 k=1
ifx_b onsoctcp machine2.local 15020 k=1
A replicação está configurada e funcionando, no segundo ifx_b
que eu corri:
$ onstat -g cluster
IBM Informix Dynamic Server Version 12.10.FC12 -- Read-Only (Sec) -- Up 00:05:08 -- 156276 Kbytes
Primary Server:ifx_a
Index page logging status: Enabled
Index page logging was enabled at: 2018/07/31 23:44:14
Server ACKed Log Supports Status
(log, page) Updates
ifx_b 16,6 No ASYNC(HDR),Connected,On
$ onstat -g dri
IBM Informix Dynamic Server Version 12.10.FC12 -- Read-Only (Sec) -- Up 00:12:23 -- 156276 Kbytes
Data Replication at 0x45ac6028:
Type State Paired server Last DR CKPT (id/pg) Supports Proxy Writes
HDR Secondary on ifx_a 16 / 5 N
DRINTERVAL 0
DRTIMEOUT 30
DRAUTO 3
DRLOSTFOUND /opt/IBM/informix/12.10/etc/dr.lostfound
DRIDXAUTO 0
ENCRYPT_HDR 0
Backlog 0
Last Send 2018/08/01 00:12:12
Last Receive 2018/08/01 00:12:12
Last Ping 2018/08/01 00:12:03
Last log page applied(log id,page): 0,0
No primário ifx_a
eu corro:
$ onstat -g cluster
IBM Informix Dynamic Server Version 12.10.FC12 -- On-Line (Prim) -- Up 00:21:30 -- 156276 Kbytes
Primary Server:ifx_a
Current Log Page:16,6
Index page logging status: Enabled
Index page logging was enabled at: 2018/07/31 23:44:14
Server ACKed Log Applied Log Supports Status
(log, page) (log, page) Updates
ifx_b 16,6 16,6 No ASYNC(HDR),Connected,On
$ onstat -g dri
IBM Informix Dynamic Server Version 12.10.FC12 -- On-Line (Prim) -- Up 00:27:49 -- 156276 Kbytes
Data Replication at 0x45ac6028:
Type State Paired server Last DR CKPT (id/pg) Supports Proxy Writes
primary on ifx_b 16 / 5 NA
DRINTERVAL 0
DRTIMEOUT 30
DRAUTO 3
DRLOSTFOUND /opt/IBM/informix/12.10/etc/dr.lostfound
DRIDXAUTO 0
ENCRYPT_HDR 0
Backlog 0
Last Send 2018/08/01 00:11:53
Last Receive 2018/08/01 00:11:53
Last Ping 2018/08/01 00:11:32
Last log page applied(log id,page): 16,6
A replicação está funcionando e os servidores estão sincronizados.
Em uma terceira máquina machine3
configurei o gerenciador de conexões com os seguintes parâmetros:
NAME icm_1
LOG 1
LOGFILE ${INFORMIXDIR}/tmp/icm_1.log
DEBUG 1
CM_TIMEOUT 60
EVENT_TIMEOUT 60
SECONDARY_EVENT_TIMEOUT 60
SQLHOSTS LOCAL
CLUSTER ifx_abc
{
INFORMIXSERVER servers_abc
SLA sla_ifx_abc DBSERVERS=PRI
FOC ORDER=ENABLED \
PRIORITY=1
}
E o arquivo sqlhosts
para o gerenciador de conexões:
servers_abc group - - i=10,c=1,e=ifx_b
ifx_a onsoctcp machine1.local 15010 k=1,g=servers_abc
ifx_b onsoctcp machine2.local 15020 k=1,g=servers_abc
sla_ifx_abc onsoctcp machine3.local 15030 k=1
Agora eu removo com força machine1
um Eu obtenho o seguinte no log on-line do servidor Informix ifx_b
:
08/02/18 01:00:26 Maximum server connections 1
08/02/18 01:00:26 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 38, Llog used 0
08/02/18 01:06:39 The SMX connection between high availability servers was closed because the
peer server was unresponsive for the timeout period (60 seconds times the
number of retries).
08/02/18 01:06:39 The SMX connection between high availability servers was closed because the
peer server was unresponsive for the timeout period (60 seconds times the
number of retries).
08/02/18 01:07:28 DR: ping timeout
08/02/18 01:07:28 DR: Receive error
08/02/18 01:07:28 dr_secrcv thread : asfcode = -25582: oserr = 4: errstr = : Network connection is broken.
System error = 4.
08/02/18 01:07:28 DR_ERR set to -1
08/02/18 01:07:28 DR: Receive Btree error
08/02/18 01:07:29 DR: Turned off on secondary server
08/02/18 01:07:36 SCHAPI: Issued Task() or Admin() command "task( 'ha make primary force', 'ifx_b' )".
08/02/18 01:07:37 Skipping failover callback.
08/02/18 01:07:48 Logical Recovery has reached the transaction cleanup phase.
08/02/18 01:07:48 Checkpoint Completed: duration was 0 seconds.
08/02/18 01:07:48 Thu Aug 2 - loguniq 19, logpos 0xcb4018, timestamp: 0xf287b Interval: 148
08/02/18 01:07:48 Maximum server connections 2
08/02/18 01:07:48 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 72, Llog used 1
08/02/18 01:07:48 Logical Recovery Complete.
9001 Committed, 0 Rolled Back, 0 Open, 0 Bad Locks
08/02/18 01:07:48 Logical Recovery Complete.
08/02/18 01:07:48 Performance Advisory: Based on the current workload, the physical log might be too small to
accommodate the time it takes to flush the buffer pool.
08/02/18 01:07:48 Results: The server might block transactions during checkpoints.
08/02/18 01:07:48 Action: If transactions are blocked during the checkpoint, increase the size of the
physical log to at least 716800 KB.
08/02/18 01:07:48 Performance Advisory: Based on the current workload, the logical log space might be too small to
accommodate the time it takes to flush the buffer pool.
08/02/18 01:07:48 Results: The server might block transactions during checkpoints.
08/02/18 01:07:48 Action: If transactions are blocked during the checkpoint, increase the size of the
logical log space to at least 89600 KB.
08/02/18 01:07:48 Performance Advisory: The physical log is too small for automatic checkpoints.
08/02/18 01:07:48 Results: Automatic checkpoints are disabled.
08/02/18 01:07:48 Action: To enable automatic checkpoints, increase the physical log to at least 716800 KB.
08/02/18 01:07:48 Quiescent Mode
08/02/18 01:07:48 Checkpoint Completed: duration was 0 seconds.
08/02/18 01:07:48 Thu Aug 2 - loguniq 19, logpos 0xcb6018, timestamp: 0xf288d Interval: 149
08/02/18 01:07:48 Maximum server connections 2
08/02/18 01:07:48 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 4, Llog used 2
08/02/18 01:07:48 B-tree scanners enabled.
08/02/18 01:07:48 DR: Reservation of the last logical log for log backup turned on
08/02/18 01:07:48 DR: new type = primary, secondary server name = ifx_a
08/02/18 01:07:48 DR: Trying to connect to secondary server = ifx_a
08/02/18 01:07:48 Starting BldNotification
08/02/18 01:07:49 On-Line Mode
08/02/18 01:07:50 SCHAPI: Started dbScheduler thread.
08/02/18 01:07:50 Auto Registration is synced
08/02/18 01:07:50 SCHAPI: Started 2 dbWorker threads.
08/02/18 01:07:51 DR: Cannot connect to secondary server
08/02/18 01:07:51 DR: Turned off on primary server
08/02/18 01:07:51 DDR Log staging: Using the directory /opt/IBM/informix/12.10/tmp/ifmxddrlog_2.
08/02/18 01:07:51 SCHAPI: dbutil threads is already running.
08/02/18 01:07:51 SCHAPI: dbScheduler threads is already running.
08/02/18 01:07:52 Defragmenter cleaner thread now running
08/02/18 01:07:52 Defragmenter cleaner thread cleaned:0 partitions
08/02/18 01:08:12 DR: Cannot connect to secondary server
08/02/18 01:08:12 DR: Turned off on primary server
08/02/18 01:09:14 DR: Cannot connect to secondary server
08/02/18 01:09:14 DR: Turned off on primary server
08/02/18 01:10:14 DR: Cannot connect to secondary server
08/02/18 01:10:14 DR: Turned off on primary server
08/02/18 01:11:14 DR: Cannot connect to secondary server
08/02/18 01:11:14 DR: Turned off on primary server
08/02/18 01:12:14 DR: Cannot connect to secondary server
08/02/18 01:12:14 DR: Turned off on primary server
08/02/18 01:12:53 Checkpoint Completed: duration was 0 seconds.
08/02/18 01:12:53 Thu Aug 2 - loguniq 19, logpos 0xcf0704, timestamp: 0xf3ab0 Interval: 150
08/02/18 01:12:53 Maximum server connections 2
08/02/18 01:12:53 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 39, Llog used 58
08/02/18 01:13:14 DR: Cannot connect to secondary server
08/02/18 01:13:14 DR: Turned off on primary server
E no log do gerenciador de conexões, obtenho:
Thu Aug 2 00:57:42 2018
00:57:42 IBM Informix Connection Manager
00:57:42 IBM Informix CSDK Version 4.10, IBM Informix-ESQL Version 4.10.FC12
00:57:42 Build Number: N037
00:57:42 Build Host: hans
00:57:42 Build OS: Linux 2.6.18-128.el5
00:57:42 Build Date: Tue Jun 26 08:55:45 CDT 2018
00:57:42 GLS Version: glslib-6.00.FC13
00:57:42
00:57:42 set CM_TIMEOUT 60
00:57:42 set global EVENT_TIMEOUT 60
00:57:42 set global SECONDARY_EVENT_TIMEOUT 60
00:57:42 DEBUG[TID139959615362880]:add type PRI to SLA sla_ifx_abc [cmsm_sla.c:parse_sla:2412]
00:57:42 SLA sla_ifx_abc listener mode REDIRECT
00:57:42 Connection Manager name is icm_1
00:57:42 Starting Connection Manager...
00:57:42 DEBUG[TID139959615362880]:Password File /opt/IBM/informix/4.10/etc/passwd_file failed error:No such file or directory[2] [cmsm_main.c:cmsm_pw_init:1782]
00:57:42 Warning: Password Manager failed; working in trusted node mode
00:57:42 set the maximum number of file descriptors 32767 failed
00:57:42 DEBUG[TID139959615362880]:setrlimit(RLIMIT_NOFILE, 32767) = -1, error:Operation not permitted[1] [cmsm_main.c:cmsm_run:4901]
00:57:42 the current maximum number of file descriptors is 1024
00:57:42 DEBUG[TID139959615362880]:new daemon pid is 11761 [cmsm_main.c:cmsm_daemonize:4653]
00:57:42 DEBUG[TID139959615362880]:dbservername = servers_abc [cmsm_server.ec:cmsm_primary:2825]
00:57:42 DEBUG[TID139959615362880]:nettype = [cmsm_server.ec:cmsm_primary:2826]
00:57:42 DEBUG[TID139959615362880]:hostname = - [cmsm_server.ec:cmsm_primary:2827]
00:57:42 DEBUG[TID139959615362880]:servicename = - [cmsm_server.ec:cmsm_primary:2828]
00:57:42 DEBUG[TID139959615362880]:options = i=10,c=1,e=ifx_b [cmsm_server.ec:cmsm_primary:2829]
00:57:42 DEBUG[TID139959615362880]:connect to servers_abc for CLUSTER ifx_abc [cmsm_server.ec:cmsm_primary:2845]
00:57:42 DEBUG[TID139959615362880]:connect to group servers_abc server ifx_a for CLUSTER ifx_abc [cmsm_server.ec:cmsm_primary:2866]
00:57:42 DEBUG[TID139959615362880]:create new thread -509528320 for ifx_a [cmsm_sla.c:cmsm_update_server_ex:996]
00:57:42 DEBUG[TID139959615362880]:connect to group servers_abc server ifx_b for CLUSTER ifx_abc [cmsm_server.ec:cmsm_primary:2866]
00:57:42 DEBUG[TID139959615362880]:create new thread -511629568 for ifx_b [cmsm_sla.c:cmsm_update_server_ex:996]
00:57:42 listener sla_ifx_abc initializing
00:57:42 DEBUG[TID139959585543936]:listenter sla_ifx_abc host=machine3.local service=15030 nettype=soctcp engtypeon [cmsm_main.c:cmsm_listener_thread:1166]
00:57:42 DEBUG[TID139959589746432]:CONNECT to @ifx_a|onsoctcp|machine1.local|15010 AS ifx_a1 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:136]
00:57:42 DEBUG[TID139959589746432]:database sysmaster @ifx_a|onsoctcp|machine1.local|15010 AS ifx_a1 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:145]
00:57:42 DEBUG[TID139959589746432]:ifx_a protocols = 1 [cmsm_server.ec:cmsm_add_dbaliases:2433]
00:57:42 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator setting primary name = ifx_a [cmsm_arb.c:arb_set_primary:793]
00:57:42 Cluster ifx_abc Arbitrator FOC ORDER=ENABLED PRIORITY=1
00:57:42 Connection Manager successfully connected to ifx_a
00:57:42 DEBUG[TID139959587645184]:CONNECT to @ifx_b|onsoctcp|machine2.local|15020 AS ifx_b1 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:136]
00:57:42 DEBUG[TID139959587645184]:database sysmaster @ifx_b|onsoctcp|machine2.local|15020 AS ifx_b1 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:145]
00:57:42 Listener sla_ifx_abc DBSERVERS=PRI is active with 4 worker threads
00:57:42 DEBUG[TID139959587645184]:ifx_b protocols = 1 [cmsm_server.ec:cmsm_add_dbaliases:2433]
00:57:42 Connection Manager successfully connected to ifx_b
00:57:42 DEBUG[TID139959589746432]:server ifx_a version is 12.10 [cmsm_er.ec:cmsm_er_monitor:2994]
00:57:42 DEBUG[TID139959589746432]:ACTIVE MONITOR ifx_a ifx_a 192.168.56.101 49464 [cmsm_er.ec:cmsm_er_monitor:3036]
00:57:42 DEBUG[TID139959589746432]:register CM successfully icm_1 with token 1782701540 [cmsm_server.ec:cmsm_cm_register:2336]
00:57:42 DEBUG[TID139959587645184]:server ifx_b version is 12.10 [cmsm_er.ec:cmsm_er_monitor:2994]
00:57:42 DEBUG[TID139959587645184]:ACTIVE MONITOR ifx_b ifx_b 192.168.56.101 44376 [cmsm_er.ec:cmsm_er_monitor:3036]
00:57:42 DEBUG[TID139959589746432]:Register server ifx_a for ifx_a count = 1 [cmsm_er.ec:cmsm_er_event_process:1281]
00:57:42 DEBUG[TID139959587645184]:register CM successfully icm_1 with token 1120237392 [cmsm_server.ec:cmsm_cm_register:2336]
00:57:42 DEBUG[TID139959587645184]:Register server ifx_b for ifx_b count = 1 [cmsm_er.ec:cmsm_er_event_process:1281]
00:57:43 DEBUG[TID139959615362880]:fcntl(/opt/IBM/informix/4.10/tmp/cmsm.pid.icm_1) success error:Success[0] [cmsm_main.c:cmsm_pidfile:2193]
00:57:43 Connection Manager started successfully
00:57:43 DEBUG[TID139959589746432]:SQL get ifx_a event SRV_ADM 16:5 SDS,HDR,RSS [cmsm_er.ec:cmsm_er_event_process:1883]
00:57:43 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator reinitialized CM names [cmsm_arb.c:arb_clear_cms_list:655]
00:57:43 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator added CM name = icm_1 [cmsm_arb.c:arb_add_cm:584]
00:57:43 CM icm_1 arbitrator for ifx_abc is active
00:57:43 Cluster ifx_abc Arbitrator FOC ORDER=SDS,HDR,RSS PRIORITY=1 TIMEOUT=1
00:57:43 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator reinitialized CM names [cmsm_arb.c:arb_clear_cms_list:655]
00:57:43 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator added CM name = icm_1 [cmsm_arb.c:arb_add_cm:584]
00:57:43 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator reinitialized CM names [cmsm_arb.c:arb_clear_cms_list:655]
00:57:43 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator added CM name = icm_1 [cmsm_arb.c:arb_add_cm:584]
00:57:43 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator reinitialized CM names [cmsm_arb.c:arb_clear_cms_list:655]
00:57:43 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator added CM name = icm_1 [cmsm_arb.c:arb_add_cm:584]
01:01:25 DEBUG[TID139959583442688]:SLA sla_ifx_abc 3 ifx_a time=1 latency= 0.00 readyQ=19 session=0 adjustSession=0 [cmsm_sla.c:select_server_from_sla:2947]
01:01:25 SLA sla_ifx_abc redirect SQLI client from 192.168.56.101 to ifx_a 192.168.56.102.15010
01:01:39 DEBUG[TID139959581341440]:SLA sla_ifx_abc 3 ifx_a time=0 latency= 0.00 readyQ=18 session=0 adjustSession=0 [cmsm_sla.c:select_server_from_sla:2947]
01:01:39 SLA sla_ifx_abc redirect SQLI client from 192.168.56.101 to ifx_a 192.168.56.102.15010
01:01:39 DEBUG[TID139959579240192]:SLA sla_ifx_abc 3 ifx_a time=0 latency= 0.00 readyQ=18 session=0 adjustSession=1 [cmsm_sla.c:select_server_from_sla:2947]
01:01:39 SLA sla_ifx_abc redirect SQLI client from 192.168.56.101 to ifx_a 192.168.56.102.15010
01:02:09 DEBUG[TID139959577138944]:SLA sla_ifx_abc 3 ifx_a time=6 latency= 0.00 readyQ=16 session=0 adjustSession=0 [cmsm_sla.c:select_server_from_sla:2947]
01:02:09 SLA sla_ifx_abc redirect SQLI client from 192.168.56.101 to ifx_a 192.168.56.102.15010
01:06:16 Server ifx_a no response over EVENT_TIMEOUT 60 seconds
01:06:16 Connection Manager disconnected from ifx_a
01:06:16 DEBUG[TID139959589746432]:start failover in sqlbreakcallback for ifx_a [cmsm_server.ec:cmsm_event_callback_common:1274]
01:06:16 DEBUG[TID139959575037696]:starting the failover process [cmsm_server.ec:cmsm_failover_incallback:1184]
01:06:17 Cluster ifx_abc Arbitrator detected primary server is down.
01:06:17 Cluster ifx_abc Arbitrator detected majority of secondarys are still connected.
01:06:17 Cluster ifx_abc Arbitrator will not perform failover.
01:06:17 DEBUG[TID139959575037696]:failover process returns -1 [cmsm_server.ec:cmsm_failover_incallback:1207]
01:06:39 DEBUG[TID139959587645184]:create new thread -524237056 for ifx_a [cmsm_sla.c:cmsm_update_server_ex:996]
01:06:39 DEBUG[TID139959587645184]:create new thread -526338304 for ifx_a [cmsm_sla.c:cmsm_update_server_ex:996]
01:06:39 DEBUG[TID139959575037696]:Exit server ifx_a monitor machine3.local thread status=8 [cmsm_er.ec:cmsm_er_monitor:2608]
01:07:19 DEBUG[TID139959589746432]:fetch sysrepstats_cursor SQLCODE = (-213,0,) [cmsm_er.ec:cmsm_er_event_process:1711]
01:07:19 DEBUG[TID139959589746432]:unregister event failed, sqlcode = (-1811,0,) [cmsm_server.ec:cmsm_cm_unregister:2383]
01:07:19 DEBUG[TID139959589746432]:Unregister server ifx_a for ifx_a count = 0 [cmsm_er.ec:cmsm_er_event_process:2092]
01:07:19 Connection Manager disconnected from ifx_a
01:07:19 ALARM 3002 detected lost connection to Informix server ifx_a from machine3.local
01:07:19 DEBUG[TID139959589746432]:Connection Manager disconnected from ifx_a sqlcode=-1803 [cmsm_er.ec:cmsm_er_monitor:3090]
01:07:19 Cluster ifx_abc Arbitrator detected primary server is down.
01:07:19 Cluster ifx_abc Arbitrator detected majority of secondarys are still connected.
01:07:19 Cluster ifx_abc Arbitrator will not perform failover.
01:07:19 Cluster ifx_abc Arbitrator detected primary server is down.
01:07:19 Cluster ifx_abc Arbitrator detected majority of secondarys are still connected.
01:07:19 Cluster ifx_abc Arbitrator will not perform failover.
01:07:20 Cluster ifx_abc Arbitrator detected primary server is down.
01:07:20 Cluster ifx_abc Arbitrator detected majority of secondarys are still connected.
01:07:20 Cluster ifx_abc Arbitrator will not perform failover.
01:07:21 DEBUG[TID139959572936448]:CONNECT to @ifx_a|onsoctcp|machine1.local|15010 AS ifx_a2 SQLCODE = (-908,107,ifx_a) [cmsm_server.ec:cmsm_connect_byurl_opt:136]
01:07:21 ALARM 3001 unable to connect to Informix server ifx_a from machine3.local error -908
01:07:21 Cluster ifx_abc Arbitrator detected primary server is down.
01:07:21 Cluster ifx_abc Arbitrator detected majority of secondarys are still connected.
01:07:21 Cluster ifx_abc Arbitrator will not perform failover.
01:07:22 DEBUG[TID139959572936448]:Exit server ifx_a monitor machine3.local thread status=8 [cmsm_er.ec:cmsm_er_monitor:2608]
01:07:24 DEBUG[TID139959589746432]:CONNECT to @ifx_a|onsoctcp|machine1.local|15010 AS ifx_a3 SQLCODE = (-908,107,ifx_a) [cmsm_server.ec:cmsm_connect_byurl_opt:136]
01:07:24 Cluster ifx_abc Arbitrator detected primary server is down.
01:07:24 Cluster ifx_abc Arbitrator detected majority of secondarys are still connected.
01:07:24 Cluster ifx_abc Arbitrator will not perform failover.
01:07:27 DEBUG[TID139959589746432]:CONNECT to @ifx_a|onsoctcp|machine1.local|15010 AS ifx_a4 SQLCODE = (-908,107,ifx_a) [cmsm_server.ec:cmsm_connect_byurl_opt:136]
01:07:27 Cluster ifx_abc Arbitrator detected primary server is down.
01:07:27 Cluster ifx_abc Arbitrator detected majority of secondarys are still connected.
01:07:27 Cluster ifx_abc Arbitrator will not perform failover.
01:07:30 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:6[DROFF] ifx_a [cmsm_er.ec:cmsm_er_event_process:1734]
01:07:30 DEBUG[TID139959589746432]:CONNECT to @ifx_a|onsoctcp|machine1.local|15010 AS ifx_a5 SQLCODE = (-908,107,ifx_a) [cmsm_server.ec:cmsm_connect_byurl_opt:136]
01:07:30 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator start failover logic now... [cmsm_arb.c:arb_server_down:1564]
01:07:30 ALARM 2001 failover arbitrator automated failover in progress.
01:07:30 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator try failover on generic SDS [cmsm_arb.c:arb_failover_by_foc:1174]
01:07:30 DEBUG[TID139959589746432]:Cluster ifx_abc Arbitrator try failover on generic HDR [cmsm_arb.c:arb_failover_by_foc:1200]
01:07:30 DEBUG[TID139959589746432]:CLuster ifx_abc Arbitrator failover to ifx_b, waiting 150 seconds for confirmation [cmsm_arb.c:arb_post_wait:1764]
01:07:30 DEBUG[TID139959589746432]:Arbitrator found failover node = ifx_b [cmsm_server_arb.ec:arb_failover_to:303]
01:07:37 DEBUG[TID139959589746432]:CONNECT to @ifx_b|onsoctcp|machine2.local|15020 AS ifx_b2 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:136]
01:07:37 DEBUG[TID139959589746432]:database sysmaster @ifx_b|onsoctcp|machine2.local|15020 AS ifx_b2 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:145]
01:07:37 DEBUG[TID139959589746432]:Arbitrator connected to failover node = ifx_b [cmsm_server_arb.ec:arb_failover_to:316]
01:07:37 DEBUG[TID139959589746432]:Arbitrator get HA_ALIAS on ifx_b, HA_ALIAS [ifx_b] [cmsm_server_arb.ec:arb_failover_to:326]
01:07:50 DEBUG[TID139959587645184]:SQL get ifx_b event SRV_ADM 16:3 1 [cmsm_er.ec:cmsm_er_event_process:1883]
01:07:50 Server ifx_b is in quiescent mode.
01:07:50 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:6[DROFF] ifx_a [cmsm_er.ec:cmsm_er_event_process:1734]
01:07:50 DEBUG[TID139959587645184]:SQL get ifx_b event SRV_ADM 16:3 5 [cmsm_er.ec:cmsm_er_event_process:1883]
01:07:50 The server type of cluster ifx_abc server ifx_b is Primary.
01:07:50 DEBUG[TID139959587645184]:Cluster ifx_abc Arbitrator setting primary name = ifx_b [cmsm_arb.c:arb_set_primary:793]
01:07:50 Cluster ifx_abc Arbitrator FOC ORDER=SDS,HDR,RSS PRIORITY=1 TIMEOUT=1
01:07:50 Server ifx_b is in on-line mode.
01:07:50 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:5[NEWPRI] ifx_b [cmsm_er.ec:cmsm_er_event_process:1734]
01:07:50 Arbitrator make primary on node = ifx_b successful
01:07:50 ALARM 2002 failover arbitrator automated failover completed
01:07:50 DEBUG[TID139959589746432]:Monitor ifx_a exit by arbitrator [cmsm_er.ec:cmsm_er_monitor:2789]
01:07:50 DEBUG[TID139959589746432]:all monitors exited, server ifx_a register count = 0 [cmsm_er.ec:cmsm_er_monitor:3156]
01:07:50 DEBUG[TID139959589746432]:enqueue a cleanup task for ifx_a [cmsm_er.ec:cmsm_enqueu_cleanup:2378]
01:07:50 DEBUG[TID139959587645184]:server monitor ifx_b dequeue a cleanup task [cmsm_er.ec:cmsm_er_event_process:2060]
01:07:50 DEBUG[TID139959587645184]:SQL get ifx_b event SRV_ADM 16:5 SDS,HDR,RSS [cmsm_er.ec:cmsm_er_event_process:1883]
01:07:50 DEBUG[TID139959587645184]:Cluster ifx_abc Arbitrator reinitialized CM names [cmsm_arb.c:arb_clear_cms_list:655]
01:07:50 DEBUG[TID139959587645184]:Cluster ifx_abc Arbitrator added CM name = icm_1 [cmsm_arb.c:arb_add_cm:584]
01:07:57 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:6[DROFF] ifx_a [cmsm_er.ec:cmsm_er_event_process:1734]
01:08:13 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:6[DROFF] ifx_a [cmsm_er.ec:cmsm_er_event_process:1734]
01:09:19 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:6[DROFF] ifx_a [cmsm_er.ec:cmsm_er_event_process:1734]
01:10:17 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:6[DROFF] ifx_a [cmsm_er.ec:cmsm_er_event_process:1734]
01:11:22 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:6[DROFF] ifx_a [cmsm_er.ec:cmsm_er_event_process:1734]
01:12:20 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:6[DROFF] ifx_a [cmsm_er.ec:cmsm_er_event_process:1734]
01:13:17 DEBUG[TID139959587645184]:get ifx_b CLUST_CHG event 1:6[DROFF] ifx_a [cmsm_er.ec:cmsm_er_event_process:1734]
Então, existem diferenças, já que parece que você teve um AF (Falha de Asserção) durante a promoção do secundário.
Você poderia postar mais de seus registros? Da direita antes de puxar o primário até o secundário ficar preso na recuperação.