O cluster de failover falhou no failover devido a um misterioso conflito de IP?

5

Estou tendo um problema misterioso com meu cluster de failover,

Cluster name: PrintCluster01.domain.com
Members: PrintServer01.domain.com  andPrintServer02.domain.com

no gerenciamento de cluster de failover - evento de cluster recebi a mensagem de erro crítico 1135 e 1177:

Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 15/06/2011 9:07:49 PM
Event ID: 1177
Task Category: None
Level: Critical
Keywords: 
User: SYSTEM
Computer: PrintServer01.domain.com
Description:
The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk. 
Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.


Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 15/06/2011 9:07:28 PM
Event ID: 1135
Task Category: None
Level: Critical
Keywords: 
User: SYSTEM
Computer: PrintServer01.domain.com
Description:
Cluster node 'PrintServer02' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

Após uma investigação mais aprofundada, encontrei alguns erros interessantes aqui, desde a primeira mensagem de erro crítico registrada no Visualizador de eventos no PrintServer02:

Log Name: System
Source: Tcpip
Date: 15/06/2011 9:07:29 PM
Event ID: 4199
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: PrintServer02-VM.domain.com
Description:
The system detected an address conflict for IP address 192.168.127.142 with the system having network hardware address 00-50-56-AE-29-23. Network operations on this system may be disrupted as a result.

192.168.127.142 - > IP secundário do PrintServer01 como isso poderia ser possível conflito por um dos nó PrintServer01? o detalhado é como abaixo:

**From PrintServer01**
Ethernet adapter Local Area Connection* 8:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
 Physical Address. . . . . . . . . : 02-50-56-AE-29-23
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 169.254.1.183(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.0.0
 Default Gateway . . . . . . . . . :
 NetBIOS over Tcpip. . . . . . . . : Enabled

Eu verifiquei em todos os membros do cluster que todos os endereços IP agora são exclusivos.

No entanto, tenho certeza de que o IP é estático não pelo DHCP, como nos resultados IPCONFIG abaixo:

From **PrintServer01** (the Active Node)
Windows IP Configuration

Host Name . . . . . . . . . . . . : PrintServer01
 Primary Dns Suffix . . . . . . . : domain.com
 Node Type . . . . . . . . . . . . : Hybrid
 IP Routing Enabled. . . . . . . . : No
 WINS Proxy Enabled. . . . . . . . : No
 DNS Suffix Search List. . . . . . : domain.com
 domain.com.au

Ethernet adapter Local Area Connection* 8:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
 Physical Address. . . . . . . . . : 02-50-56-AE-29-23
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 169.254.1.183(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.0.0
 Default Gateway . . . . . . . . . :
 NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Cluster Public Network:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection
 Physical Address. . . . . . . . . : 00-50-56-AE-29-23
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 192.168.127.155(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 IPv4 Address. . . . . . . . . . . : 192.168.127.88(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 IPv4 Address. . . . . . . . . . . : 192.168.127.142(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 IPv4 Address. . . . . . . . . . . : 192.168.127.143(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 IPv4 Address. . . . . . . . . . . : 192.168.127.144(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 Default Gateway . . . . . . . . . : 192.168.127.254
 DNS Servers . . . . . . . . . . . : 192.168.127.10
 192.168.127.11
 Primary WINS Server . . . . . . . : 192.168.127.10
 Secondary WINS Server . . . . . . : 192.168.127.11
 NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Cluster Private Network:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection #2
 Physical Address. . . . . . . . . : 00-50-56-AE-43-EC
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 10.184.2.2(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 Default Gateway . . . . . . . . . :
 NetBIOS over Tcpip. . . . . . . . : Disabled


From **PrintServer02**
Windows IP Configuration

Host Name . . . . . . . . . . . . : PrintServer02
 Primary Dns Suffix . . . . . . . : domain.com
 Node Type . . . . . . . . . . . . : Hybrid
 IP Routing Enabled. . . . . . . . : No
 WINS Proxy Enabled. . . . . . . . : No
 DNS Suffix Search List. . . . . . : domain.com
 domain.com.au

Ethernet adapter Local Area Connection* 8:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
 Physical Address. . . . . . . . . : 02-50-56-AE-5F-E5
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 169.254.2.86(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.0.0
 Default Gateway . . . . . . . . . :
 NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Cluster Public Network:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection
 Physical Address. . . . . . . . . : 00-50-56-AE-79-FA
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 192.168.127.172(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 IPv4 Address. . . . . . . . . . . : 192.168.127.119(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 Default Gateway . . . . . . . . . : 192.168.127.254
 DNS Servers . . . . . . . . . . . : 192.168.127.10
 192.168.127.11
 Primary WINS Server . . . . . . . : 192.168.127.11
 Secondary WINS Server . . . . . . : 192.168.127.10
 NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Cluster Private Network:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection #2
 Physical Address. . . . . . . . . : 00-50-56-AE-77-8D
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 10.184.2.3(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 Default Gateway . . . . . . . . . :
 NetBIOS over Tcpip. . . . . . . . : Disabled

Qualquer ajuda seria muito apreciada.

Obrigado AWT

    
por Senior Systems Engineer 16.06.2011 / 13:14

4 respostas

2

O erro de conflito de endereço IP ocorre quando mais de um nó em um cluster tenta colocar um grupo de recursos (e seus IPs associados) on-line ao mesmo tempo.

Isso pode acontecer se os nós do cluster momentaneamente perderem contato uns com os outros. Cada nó assume que o outro nó falhou, como resultado, o nó 'passivo' colocará todos os grupos de recursos online quando eles estiverem de fato online no nó 'ativo'.

Eu tenho visto esse problema em nosso ambiente VMWare quando um dos hosts ESX (i) está sobrecarregado - às vezes, mesmo durante buscas de HBA, subitamente os nós MSCS perdem contato e essa confusão ocorre.

    
por 25.01.2013 / 15:35
3

Use o script nesta página para consultar os endereços mac da VM:

link

Combine-o com seu endereço MAC com comportamento inadequado e examine a máquina com cuidado.

    
por 13.07.2011 / 17:23
1

IMHO qualquer IP de serviço lógico deve ter uma máscara de sub-rede de / 32. A rede deve ser atendida pelo IP físico, que deve ter uma máscara de sub-rede correspondente à sub-rede usada.

    
por 17.07.2011 / 22:58
0

Resolvi esse problema atribuindo IP automaticamente e, novamente, atribuindo IP manualmente. Isso pediu para remover dispositivos não presentes e isso resolveu o problema.

    
por 31.12.2013 / 22:44