O problema foi resolvido.
O problema foi da configuração do switch. Quando o modo de filtro multicast é filtrar tudo, o problema aconteceu. O Keepalived corre O.K. quando o modo de filtro multicast é encaminhado para todos.
Obrigado pela ajuda.
Ambos os dois servidores iniciaram keepalived, e o servidor BACKUP transitou para o MASTER STATE imediatamente. os dois se tornaram MASTER agora.
Ambos os dois nós estão enviando mensagem de propaganda VRRP.
no servidor principal:
[root@zhsq1 ~]# tcpdump -c 3 -i em1 host 224.0.0.18
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on em1, link-type EN10MB (Ethernet), capture size 65535 bytes
11:01:35.526355 IP zhsq1 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 153, authtype simple, intvl 1s, length 20
11:01:36.526497 IP zhsq1 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 153, authtype simple, intvl 1s, length 20
11:01:37.527561 IP zhsq1 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 153, authtype simple, intvl 1s, length 20
no servidor de backup:
[root@zhsq2 ~]# tcpdump -c 3 -i em1 host 224.0.0.18
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on em1, link-type EN10MB (Ethernet), capture size 65535 bytes
11:11:04.314996 IP zhsq2 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20
11:11:05.315111 IP zhsq2 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20
11:11:06.316175 IP zhsq2 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20
abaixo é o log do servidor principal:
May 31 11:00:22 zhsq1 Keepalived[31475]: Starting Keepalived v1.2.7 (05/20,2013)
May 31 11:00:22 zhsq1 Keepalived[31476]: Starting Healthcheck child process, pid=31477
May 31 11:00:22 zhsq1 Keepalived[31476]: Starting VRRP child process, pid=31478
May 31 11:00:22 zhsq1 Keepalived_healthcheckers[31477]: Interface queue is empty
May 31 11:00:22 zhsq1 Keepalived_healthcheckers[31477]: No such interface, em2
May 31 11:00:22 zhsq1 Keepalived_healthcheckers[31477]: Netlink reflector reports IP 10.0.7.60 added
May 31 11:00:22 zhsq1 Keepalived_healthcheckers[31477]: Netlink reflector reports IP fe80::92b1:1cff:fe4c:bea8 added
May 31 11:00:22 zhsq1 Keepalived_healthcheckers[31477]: Registering Kernel netlink reflector
May 31 11:00:22 zhsq1 Keepalived_healthcheckers[31477]: Registering Kernel netlink command channel
May 31 11:00:22 zhsq1 Keepalived_vrrp[31478]: Interface queue is empty
May 31 11:00:22 zhsq1 Keepalived_vrrp[31478]: No such interface, em2
May 31 11:00:22 zhsq1 Keepalived_vrrp[31478]: Netlink reflector reports IP 10.0.7.60 added
May 31 11:00:22 zhsq1 Keepalived_vrrp[31478]: Netlink reflector reports IP fe80::92b1:1cff:fe4c:bea8 added
May 31 11:00:22 zhsq1 Keepalived_vrrp[31478]: Registering Kernel netlink reflector
May 31 11:00:22 zhsq1 Keepalived_vrrp[31478]: Registering Kernel netlink command channel
May 31 11:00:22 zhsq1 Keepalived_vrrp[31478]: Registering gratuitous ARP shared channel
May 31 11:00:22 zhsq1 Keepalived_healthcheckers[31477]: Opening file '/etc/keepalived/keepalived.conf'.
May 31 11:00:22 zhsq1 Keepalived_healthcheckers[31477]: Configuration is using : 4661 Bytes
May 31 11:00:22 zhsq1 Keepalived_vrrp[31478]: Opening file '/etc/keepalived/keepalived.conf'.
May 31 11:00:22 zhsq1 Keepalived_vrrp[31478]: Configuration is using : 63856 Bytes
May 31 11:00:22 zhsq1 Keepalived_vrrp[31478]: Using LinkWatch kernel netlink reflector...
May 31 11:00:22 zhsq1 Keepalived_vrrp[31478]: VRRP sockpool: [ifindex(2), proto(112), fd(11,12)]
May 31 11:00:22 zhsq1 Keepalived_healthcheckers[31477]: Using LinkWatch kernel netlink reflector...
May 31 11:00:22 zhsq1 Keepalived_vrrp[31478]: VRRP_Script(chk_http_port) succeeded
May 31 11:00:23 zhsq1 Keepalived_vrrp[31478]: VRRP_Instance(VI_1) Transition to MASTER STATE
May 31 11:00:24 zhsq1 Keepalived_vrrp[31478]: VRRP_Instance(VI_1) Entering MASTER STATE
May 31 11:00:24 zhsq1 Keepalived_vrrp[31478]: VRRP_Instance(VI_1) setting protocol VIPs.
May 31 11:00:24 zhsq1 Keepalived_vrrp[31478]: VRRP_Instance(VI_1) Sending gratuitous ARPs on em1 for 10.0.7.65
May 31 11:00:24 zhsq1 Keepalived_healthcheckers[31477]: Netlink reflector reports IP 10.0.7.65 added
May 31 11:00:29 zhsq1 Keepalived_vrrp[31478]: VRRP_Instance(VI_1) Sending gratuitous ARPs on em1 for 10.0.7.65
abaixo é o log do servidor de backup:
May 31 11:01:50 zhsq2 Keepalived[31250]: Starting Keepalived v1.2.7 (05/20,2013)
May 31 11:01:50 zhsq2 Keepalived[31251]: Starting Healthcheck child process, pid=31252
May 31 11:01:50 zhsq2 Keepalived[31251]: Starting VRRP child process, pid=31253
May 31 11:01:50 zhsq2 Keepalived_healthcheckers[31252]: Interface queue is empty
May 31 11:01:50 zhsq2 Keepalived_healthcheckers[31252]: No such interface, em2
May 31 11:01:50 zhsq2 Keepalived_healthcheckers[31252]: Netlink reflector reports IP 10.0.7.61 added
May 31 11:01:50 zhsq2 Keepalived_healthcheckers[31252]: Netlink reflector reports IP fe80::92b1:1cff:fe4c:b8b7 added
May 31 11:01:50 zhsq2 Keepalived_healthcheckers[31252]: Registering Kernel netlink reflector
May 31 11:01:50 zhsq2 Keepalived_healthcheckers[31252]: Registering Kernel netlink command channel
May 31 11:01:50 zhsq2 Keepalived_vrrp[31253]: Interface queue is empty
May 31 11:01:50 zhsq2 Keepalived_vrrp[31253]: No such interface, em2
May 31 11:01:50 zhsq2 Keepalived_vrrp[31253]: Netlink reflector reports IP 10.0.7.61 added
May 31 11:01:50 zhsq2 Keepalived_vrrp[31253]: Netlink reflector reports IP fe80::92b1:1cff:fe4c:b8b7 added
May 31 11:01:50 zhsq2 Keepalived_vrrp[31253]: Registering Kernel netlink reflector
May 31 11:01:50 zhsq2 Keepalived_vrrp[31253]: Registering Kernel netlink command channel
May 31 11:01:50 zhsq2 Keepalived_vrrp[31253]: Registering gratuitous ARP shared channel
May 31 11:01:50 zhsq2 Keepalived_healthcheckers[31252]: Opening file '/etc/keepalived/keepalived.conf'.
May 31 11:01:50 zhsq2 Keepalived_healthcheckers[31252]: Configuration is using : 4661 Bytes
May 31 11:01:50 zhsq2 Keepalived_vrrp[31253]: Opening file '/etc/keepalived/keepalived.conf'.
May 31 11:01:50 zhsq2 Keepalived_vrrp[31253]: Configuration is using : 63856 Bytes
May 31 11:01:50 zhsq2 Keepalived_vrrp[31253]: Using LinkWatch kernel netlink reflector...
May 31 11:01:50 zhsq2 Keepalived_vrrp[31253]: VRRP_Instance(VI_1) Entering BACKUP STATE
May 31 11:01:50 zhsq2 Keepalived_vrrp[31253]: VRRP sockpool: [ifindex(2), proto(112), fd(11,12)]
May 31 11:01:50 zhsq2 Keepalived_healthcheckers[31252]: Using LinkWatch kernel netlink reflector...
May 31 11:01:50 zhsq2 Keepalived_vrrp[31253]: VRRP_Script(chk_http_port) succeeded
May 31 11:01:54 zhsq2 Keepalived_vrrp[31253]: VRRP_Instance(VI_1) Transition to MASTER STATE
May 31 11:01:55 zhsq2 Keepalived_vrrp[31253]: VRRP_Instance(VI_1) Entering MASTER STATE
May 31 11:01:55 zhsq2 Keepalived_vrrp[31253]: VRRP_Instance(VI_1) setting protocol VIPs.
May 31 11:01:55 zhsq2 Keepalived_vrrp[31253]: VRRP_Instance(VI_1) Sending gratuitous ARPs on em1 for 10.0.7.65
May 31 11:01:55 zhsq2 Keepalived_healthcheckers[31252]: Netlink reflector reports IP 10.0.7.65 added
May 31 11:02:00 zhsq2 Keepalived_vrrp[31253]: VRRP_Instance(VI_1) Sending gratuitous ARPs on em1 for 10.0.7.65
o conf de manutenção de status do servidor master está abaixo:
vrrp_script chk_http_port {
script "/opt/nginx/nginx_pid.sh"
interval 2
weight 2
}
vrrp_instance VI_1 {
state MASTER
#nopreempt
interface em1
virtual_router_id 51
priority 151
mcast_src_ip 10.0.7.60
track_interface {
em1
}
authentication {
auth_type PASS
auth_pass 1111
}
track_script {
chk_http_port
}
virtual_ipaddress {
10.0.7.65 dev em1
}
}
o conf de manutenção de vida do servidor de backup está abaixo:
vrrp_script chk_http_port {
script "/opt/nginx/nginx_pid.sh"
interval 2
weight 2
}
vrrp_instance VI_1 {
state BACKUP
interface em1
virtual_router_id 51
priority 100
mcast_src_ip 10.0.7.61
track_interface {
em1
}
authentication {
auth_type PASS
auth_pass 1111
}
track_script {
chk_http_port
}
virtual_ipaddress {
10.0.7.65 dev em1
}
}
o arquivo chk_http_port está abaixo:
NGINX_PROCESS='ps -C nginx --no-header | wc -l'
if [ $NGINX_PROCESS -eq 0 ]; then
/usr/local/nginx/sbin/nginx
sleep 3
if [ 'ps -C nginx --no-header | wc -l' -eq 0 ]; then
killall keepalived
fi
fi
Por favor me ajude.
Muito obrigado.
O problema foi resolvido.
O problema foi da configuração do switch. Quando o modo de filtro multicast é filtrar tudo, o problema aconteceu. O Keepalived corre O.K. quando o modo de filtro multicast é encaminhado para todos.
Obrigado pela ajuda.
Os pacotes não estão passando entre máquinas na interface em1 (causando um cenário de split brain como Mike afirma ).
Veja um exemplo de como um dos pacotes se parece:
Frame 2: 54 bytes on wire (432 bits), 54 bytes captured (432 bits)
Arrival Time: Jun 1, 2013 03:39:50.709520000 UTC
Epoch Time: 1370057990.709520000 seconds
[Time delta from previous captured frame: 0.000970000 seconds]
[Time delta from previous displayed frame: 0.000970000 seconds]
[Time since reference or first frame: 0.000970000 seconds]
Frame Number: 2
Frame Length: 54 bytes (432 bits)
Capture Length: 54 bytes (432 bits)
[Frame is marked: False]
[Frame is ignored: False]
[Protocols in frame: eth:ip:vrrp]
Ethernet II, Src: 00:25:90:83:b0:07 (00:25:90:83:b0:07), Dst: 01:00:5e:00:00:12 (01:00:5e:00:00:12)
Destination: 01:00:5e:00:00:12 (01:00:5e:00:00:12)
Address: 01:00:5e:00:00:12 (01:00:5e:00:00:12)
.... ...1 .... .... .... .... = IG bit: Group address (multicast/broadcast)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
Source: 00:25:90:83:b0:07 (00:25:90:83:b0:07)
Address: 00:25:90:83:b0:07 (00:25:90:83:b0:07)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
.... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
Type: IP (0x0800)
Internet Protocol Version 4, Src: 10.0.10.11 (10.0.10.11), Dst: 224.0.0.18 (224.0.0.18)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00: Not-ECT (Not ECN-Capable Transport))
0000 00.. = Differentiated Services Codepoint: Default (0x00)
.... ..00 = Explicit Congestion Notification: Not-ECT (Not ECN-Capable Transport) (0x00)
Total Length: 40
Identification: 0x8711 (34577)
Flags: 0x00
0... .... = Reserved bit: Not set
.0.. .... = Don't fragment: Not set
..0. .... = More fragments: Not set
Fragment offset: 0
Time to live: 255
Protocol: VRRP (112)
Header checksum: 0x4037 [correct]
[Good: True]
[Bad: False]
Source: 10.0.10.11 (10.0.10.11)
Destination: 224.0.0.18 (224.0.0.18)
Virtual Router Redundancy Protocol
Version 2, Packet type 1 (Advertisement)
0010 .... = VRRP protocol version: 2
.... 0001 = VRRP packet type: Advertisement (1)
Virtual Rtr ID: 254
Priority: 151 (Non-default backup priority)
Addr Count: 1
Auth Type: No Authentication (0)
Adver Int: 1
Checksum: 0x3c01 [correct]
IP Address: 10.0.0.254 (10.0.0.254)
Eu só vim aqui procurando ajuda com o mesmo problema, mas nenhuma das outras respostas ajudou. Eu rastreei meu problema, então deixarei aqui para futuros pesquisadores da web.
O cenário em que você está se candidatando, "split brain", como eles chamam, é causado por apenas como outra resposta disse: a comunicação entre os dois keepalived, especificamente os pedidos VRRP multicast, estão falhando.
Para mim, o problema real foi que eu estava testando com a configuração de VMs pela libvirt com uma rede macvtap, que bloqueia por padrão as solicitações multicast de entrada.
A solução para mim era fazer "virsh net-edit mymacvtapnetwork" e depois alterar a primeira linha, <network>
, para <network trustGuestRxFilters='yes'>
Para mais informações sobre a configuração trustGuestRxFilters, consulte os links:
Eu também pude ver isso como o OP, rodando tcpdump host 224.0.0.18
e vi o pedido VRRP sendo enviado de cada servidor, mas não sendo recebido pelo outro.
No meu caso, o problema era o SELINUX, eu o trublei pelo comando tcpdump sobre a interface onde o VIP estava configurado. Depois de liberar o iptables o problema foi resolvido.
Tags networking nginx linux keepalived