Alta latência / queda entre switches da Cisco em dois locais. Como solucionar problemas?

7

Estou recebendo reclamações de usuários sobre o desempenho insatisfatório de aplicativos de rede entre duas partes de um grande depósito. O software é um aplicativo terminal baseado em curses em execução em um servidor Linux. Os clientes são PCs que executam um cliente telnet ou SSH. O problema começou há um dia sem alterações recentes (conhecidas) no ambiente.

O switch central é um Cisco Catalyst 4507R-E no MDF , vinculado a uma pilha de 4 membros dos switches Cisco Catalyst 2960 no IDF ... Eles estão conectados via fibra multimodo. Os servidores estão no MDF. Os clientes impactados estão no IDF.

O ping do servidor de aplicativos do Linux para o endereço de gerenciamento da pilha 2960 no prédio mostra alta variação e um lote de latência:

--- shipping-2960.mdmarra.local ping statistics ---
864 packets transmitted, 864 received, 0% packet loss, time 863312ms
rtt min/avg/max/mdev = 0.521/5.317/127.037/8.698 ms

No entanto, os pings para os computadores clientes do servidor de aplicativos são um pouco mais consistentes:

--- charles-pc.mdmarra.local ping statistics ---
76 packets transmitted, 76 received, 0% packet loss, time 75001ms
rtt min/avg/max/mdev = 0.328/0.481/1.355/0.210 ms

Nenhuma das interfaces relevantes do Linux ou switchports mostram erros ( veja abaixo da questão ).

Como posso solucionar isso?

  • Existe um método fácil para determinar a atividade da porta?
  • A variação do ping no IP de gerenciamento do switch é a coisa errada a ser medida?
  • Este poderia ser o resultado de um PC desonesto?
  • Como o problema está isolado em uma parte do edifício, há algo mais que eu deveria estar verificando? Outros usuários no warehouse estão bem e não tiveram problemas.

Editar:

Mais tarde, descobri que a utilização da CPU do Cisco 2960 é extremamente alta devido ao bug detalhado aqui .

Da pilha 2960 ...

shipping-2960#sh int GigabitEthernet1/0/52
GigabitEthernet1/0/52 is up, line protocol is up (connected) 
  Hardware is Gigabit Ethernet, address is b414.894a.09b4 (bia b414.894a.09b4)
  Description: TO_MDF_4507
  MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec, 
     reliability 255/255, txload 13/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive not set
  Full-duplex, 1000Mb/s, link type is auto, media type is 1000BaseSX SFP
  input flow-control is off, output flow-control is unsupported 
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:00, output 00:00:01, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 441
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 3053000 bits/sec, 613 packets/sec
  5 minute output rate 51117000 bits/sec, 4815 packets/sec
     981767797 packets input, 615324451566 bytes, 0 no buffer
     Received 295141786 broadcasts (286005510 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 286005510 multicast, 0 pause input
     0 input packets with dribble condition detected
     6372280523 packets output, 8375642643516 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out

Saída adicional:

Utilização de CPU do Cisco 4507R-E - classificada.

Utilização da CPU do Cisco 2960 - classificada.

utilização de tcam de 2960. Não disponível no 4507.

shipping-2960# show platform tcam utilization

CAM Utilization for ASIC# 0                      Max            Used
                                             Masks/Values    Masks/values

 Unicast mac addresses:                       8412/8412        335/335   
 IPv4 IGMP groups + multicast routes:          384/384           1/1     
 IPv4 unicast directly-connected routes:       320/320          28/28    
 IPv4 unicast indirectly-connected routes:       0/0            28/28    
 IPv6 Multicast groups:                        320/320          11/11    
 IPv6 unicast directly-connected routes:       256/256           1/1     
 IPv6 unicast indirectly-connected routes:       0/0             1/1     
 IPv4 policy based routing aces:                32/32           12/12    
 IPv4 qos aces:                                384/384          42/42    
 IPv4 security aces:                           384/384          33/33    
 IPv6 policy based routing aces:                16/16            8/8     
 IPv6 qos aces:                                 60/60           31/31    
 IPv6 security aces:                           128/128           9/9     

histórico de utilização da CPU do Cisco 2960 ...

shipping-2960#show processes cpu history

    3333333444443333344444444443333333333444443333344444444443
    9977777111119999966666222229999977777555559999911111000008
100                                                           
 90                                                           
 80                                                           
 70                                                           
 60                                                           
 50                  *****               *****                
 40 **********************************************************
 30 **********************************************************
 20 **********************************************************
 10 **********************************************************
   0....5....1....1....2....2....3....3....4....4....5....5....
             0    5    0    5    0    5    0    5    0    5    
               CPU% per second (last 60 seconds)

    4488887787444454444787888444444454677774444444447888544444
    6401207808656506776708000447546664789977697589953201636647
100                                                           
 90                                                           
 80   *###*##*         *#*##*          *#**          ###      
 70   #######*         *#####         *###*         *###      
 60   #######*         *#####       * *####         *###*     
 50 * ########*********######  ** *** *####*********####* ** *
 40 ##########################################################
 30 ##########################################################
 20 ##########################################################
 10 ##########################################################
   0....5....1....1....2....2....3....3....4....4....5....5....
             0    5    0    5    0    5    0    5    0    5    
               CPU% per minute (last 60 minutes)
              * = maximum CPU%   # = average CPU%

    8889888888888888988888889888888888888888888888888888888888888888898889
    2322334378633453364454472653323431254225563228261399243233354222402310
100                                                                       
 90    *    ***   * **  *  ****        *   ***   * *  **       *     *   *
 80 *#############################*********************************#******
 70 *#####################################################################
 60 *#####################################################################
 50 ######################################################################
 40 ######################################################################
 30 ######################################################################
 20 ######################################################################
 10 ######################################################################
   0....5....1....1....2....2....3....3....4....4....5....5....6....6....7.
             0    5    0    5    0    5    0    5    0    5    0    5    0 
                   CPU% per hour (last 72 hours)
                  * = maximum CPU%   # = average CPU%
    
por ewwhite 10.05.2013 / 14:46

1 resposta

5

Os switches da Cisco colocam o ICMP na parte inferior da lista de prioridades. Obtemos os mesmos resultados se fizermos ping em um 3750-X ocupado.

Você precisa examinar a utilização do sistema nos switches, pois suspeito que eles estejam tão ocupados que estejam fazendo o processamento de software dos pacotes. Você está executando algum tipo de serviços de camada 3 neles?

Há um bug bastante sério no IOS 12.2.53:

CSCth24278 (Catalyst 2960-S switches)

The CPU utilization on the switch remains high (50 to 60 percent) when the switch is not being accessed by a telnet or a console session. When you telnet or console into the switch, the CPU utilization goes down.

There is no workaround.

Atualize para 12.2.58-SE1 ou posterior para corrigir essa situação.

    
por 10.05.2013 / 14:58