Nós configuramos três servidores com keepalived . Começamos a perceber algumas eleições aleatórias que não podemos explicar, então eu procuro aqui procurando por conselhos.
Aqui está a nossa configuração:
MASTER:
global_defs {
notification_email {
[email protected]
}
notification_email_from keepalived@hostname
smtp_server example.com:587
smtp_connect_timeout 30
router_id some_rate
}
vrrp_script chk_nginx {
script "killall -0 nginx"
interval 2
weight 2
}
vrrp_instance VIP_61 {
interface bond0
virtual_router_id 61
state MASTER
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass PASSWORD
}
virtual_ipaddress {
X.X.X.X
X.X.X.X
X.X.X.X
}
track_script {
chk_nginx
}
}
BACKUP1:
global_defs {
notification_email {
[email protected]
}
notification_email_from keepalived@hostname
smtp_server example.com:587
smtp_connect_timeout 30
router_id some_rate
}
vrrp_script chk_nginx {
script "killall -0 nginx"
interval 2
weight 2
}
vrrp_instance VIP_61 {
interface bond0
virtual_router_id 61
state MASTER
priority 99
advert_int 1
authentication {
auth_type PASS
auth_pass PASSWORD
}
virtual_ipaddress {
X.X.X.X
X.X.X.X
X.X.X.X
}
track_script {
chk_nginx
}
}
BACKUP2:
global_defs {
notification_email {
[email protected]
}
notification_email_from keepalived@hostname
smtp_server example.com:587
smtp_connect_timeout 30
router_id some_rate
}
vrrp_script chk_nginx {
script "killall -0 nginx"
interval 2
weight 2
}
vrrp_instance VIP_61 {
interface bond0
virtual_router_id 61
state MASTER
priority 98
advert_int 1
authentication {
auth_type PASS
auth_pass PASSWORD
}
virtual_ipaddress {
X.X.X.X
X.X.X.X
X.X.X.X
}
track_script {
chk_nginx
}
}
De vez em quando eu posso ver isso acontecendo (grepped em logs):
MASTER:
Jan 6 18:30:15 lb-public01 Keepalived_vrrp[24380]: VRRP_Instance(VIP_61) Received lower prio advert, forcing new election
Jan 6 18:30:16 lb-public01 Keepalived_vrrp[24380]: VRRP_Instance(VIP_61) Received lower prio advert, forcing new election
Jan 6 18:32:37 lb-public01 Keepalived_vrrp[24380]: VRRP_Instance(VIP_61) Received lower prio advert, forcing new election
BACKUP1:
Jan 6 18:30:16 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) Transition to MASTER STATE
Jan 6 18:30:16 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) Received higher prio advert
Jan 6 18:30:16 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) Entering BACKUP STATE
Jan 6 18:32:37 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) forcing a new MASTER election
Jan 6 18:32:38 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) Transition to MASTER STATE
Jan 6 18:32:38 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) Received higher prio advert
Jan 6 18:32:38 lb-public02 Keepalived_vrrp[26235]: VRRP_Instance(VIP_61) Entering BACKUP STATE
BACKUP2:
Jan 6 18:32:36 lb-public03 Keepalived_vrrp[14255]: VRRP_Script(chk_nginx) succeeded
Jan 6 18:32:37 lb-public03 Keepalived_vrrp[14255]: VRRP_Instance(VIP_61) Transition to MASTER STATE
Jan 6 18:32:37 lb-public03 Keepalived_vrrp[14255]: VRRP_Instance(VIP_61) Received higher prio advert
Jan 6 18:32:37 lb-public03 Keepalived_vrrp[14255]: VRRP_Instance(VIP_61) Entering BACKUP STATE
Então, o MASTER recebe o anúncio do LOWER PRIO e a NOVA eleição é iniciada. PORQUE ? Parece que o BACKUP transita para o MASTER por um curto período de tempo (com base nos logs) e, em seguida, retorna ao estado BACKUP.
Eu não tenho a menor ideia de como isso está acontecendo, então qualquer sugestão seria mais do que bem-vinda.
Além disso, descobri que existe um patch unicast no keepalived, mas não está claro para mim se suporta mais de 1 ponto unicast - no nosso caso, temos um cluster de 3 máquinas, então precisamos de mais de 1 ponto unicast.
Quaisquer sugestões sobre estas questões seriam apreciadas de forma superamazingly!