Média de carga alta incomum (devido ao pico de espera de E / S? irqs?)

2

Eu tenho um problema com a alta média de carga ( ~2 ) no meu computador (laptop pessoal) por um longo tempo agora. Estou executando o Arch Linux. Se bem me lembro, o problema começou com uma certa atualização do kernel, inicialmente eu pensei que estava relacionado a esse bug . O problema não foi resolvido, quando o bug foi corrigido. Eu realmente não me importei, pois achei que ainda é um bug, porque a performance não pareceu sofrer. O que me deixou curioso é que, recentemente, tive um momento de super baixa carga média ( ~0 ) enquanto ocioso. Depois de uma reinicialização, tudo voltou ao "normal", com alta média de carga. Então comecei a investigar:

 % uptime
 14:31:04 up  2:22,  1 user,  load average: 1.96, 1.98, 1.99

Até agora nada de novo. Então eu tentei top:

 % top -b -n 1
top - 14:33:52 up  2:25,  1 user,  load average: 2.02, 2.07, 2.02
Tasks: 146 total,   2 running, 144 sleeping,   0 stopped,   0 zombie
%Cpu0  :   2.6/0.9     3[||||                                                    ]
%Cpu1  :   2.7/0.9     4[||||                                                    ]
%Cpu2  :   2.7/1.0     4[||||                                                    ]
%Cpu3  :   2.7/0.8     3[||||                                                    ]
GiB Mem :228125107552256.0/7.712    [
GiB Swap:  0.0/7.904    [                                                    ]

  PID USER  PR  NI    VIRT    RES  %CPU %MEM     TIME+ S COMMAND
    2 root  20   0    0.0m   0.0m   0.0  0.0   0:00.00 S kthreadd
  404 root  20   0    0.0m   0.0m   0.0  0.0   0:01.09 D  '- rtsx_usb_ms_2
 1854 root  20   0    0.0m   0.0m   0.0  0.0   0:06.03 D  '- kworker/0:2

Eu recortei todos os processos e threads do kernel, exceto os dois. Aqui podemos ver alguns tópicos de kernel suspeitos (estado D). E algum valor Mem suspeito (veja editar) ...

Olhando para CPU:

 % mpstat
Linux 4.13.12-1-ARCH (arch) 30.11.2017  _x86_64_    (4 CPU)

14:36:09     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
14:36:09     all    2.66    0.00    0.88    1.56    0.00    0.01    0.00    0.00    0.00   94.90
 % sar -u 1 30
Linux 4.13.12-1-ARCH (arch) 30.11.2017  _x86_64_    (4 CPU)

14:37:04    CPU %user     %nice   %system   %iowait    %steal     %idle
14:37:05    all  1.00      0.00      0.75      0.00  0.00     98.25
14:37:06    all  1.76      0.00      0.50      0.00  0.00     97.74
14:37:07    all  1.00      0.00      0.25      0.00  0.00     98.75
14:37:08    all  0.50      0.00      0.50      0.00  0.00     99.00
14:37:09    all  0.50      0.00      0.50      0.25  0.00     98.75
14:37:10    all  0.50      0.00      0.50      6.03  0.00     92.96
14:37:11    all  0.75      0.00      0.50     11.75  0.00     87.00
14:37:12    all  0.50      0.00      0.25      0.00  0.00     99.25
[ . . . ]
14:37:21    all  1.26      0.00      0.76      0.00  0.00     97.98
14:37:22    all  0.75      0.00      0.25      2.26  0.00     96.73
14:37:23    all  0.50      0.00      0.50     16.83  0.00     82.16
14:37:24    all  0.75      0.00      0.50      0.00  0.00     98.74
14:37:25    all  0.50      0.00      0.50      0.00  0.00     98.99
14:37:26    all  0.76      0.00      0.50      7.56  0.00     91.18
14:37:27    all  0.25      0.00      0.51      0.00  0.00     99.24
14:37:28    all  1.00      0.00      0.75      0.25  0.00     98.00
14:37:29    all  0.25      0.00      0.76      0.00  0.00     98.99
14:37:30    all  0.75      0.00      0.50      0.00  0.00     98.74
14:37:31    all  0.75      0.00      0.50      3.27  0.00     95.48
14:37:32    all  0.51      0.00      0.51     13.16  0.00     85.82
14:37:33    all  0.75      0.00      0.50      0.25  0.00     98.49
14:37:34    all  1.26      0.00      0.75      0.00  0.00     97.99
Average:    all  0.71      0.00      0.56      2.06  0.00     96.67

revela alguns picos na espera de E / S. O melhor palpite até agora. Olhando mais de perto:

 % iostat -x 1 30
Linux 4.13.12-1-ARCH (arch) 30.11.2017  _x86_64_    (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
       2.60    0.00    0.87    1.55    0.00   94.98

Device:     rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda       0.93     3.00    3.71    1.94    95.04   102.27    69.91     0.60  106.78   16.56  279.32  14.47   8.17

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
       0.75    0.00    0.75    0.25    0.00   98.25

Device:     rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda       0.00     0.00    0.00    1.00     0.00     0.00     0.00     0.01   13.00    0.00   13.00  10.00   1.00

[ . . . ]

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
       0.50    0.00    0.50   17.04    0.00   81.95

Device:     rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda       0.00     8.00    0.00    2.00     0.00    40.00    40.00     0.69  346.50    0.00  346.50 346.50  69.30

[ . . . ]

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
       0.25    0.00    0.50    7.29    0.00   91.96

[ . . . ]

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
       1.00    0.00    0.75   16.96    0.00   81.30

Device:     rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda       0.00     5.00    0.00    2.00     0.00    28.00    28.00     0.71  357.00    0.00  357.00 356.50  71.30

[ . . . ]

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
       0.50    0.00    0.50    0.00    0.00   99.00

Device:     rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda       0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Olhando para processos com sono ininterrupto:

 % for x in 'seq 1 1 10'; do ps -eo state,pid,cmd | grep "^D"; echo "----"; sleep 5; done
D   404 [rtsx_usb_ms_2]
D  1854 [kworker/0:2]
D  2877 [kworker/0:0]
----
D   404 [rtsx_usb_ms_2]
D  1854 [kworker/0:2]
D  2877 [kworker/0:0]
----
D   404 [rtsx_usb_ms_2]
D  1854 [kworker/0:2]
D  2877 [kworker/0:0]
----
D   404 [rtsx_usb_ms_2]
D  2877 [kworker/0:0]
----
D   404 [rtsx_usb_ms_2]
----
D   404 [rtsx_usb_ms_2]
D  1854 [kworker/0:2]
D  2877 [kworker/0:0]
----
D   404 [rtsx_usb_ms_2]
D  2877 [kworker/0:0]
----
D   404 [rtsx_usb_ms_2]
D  2877 [kworker/0:0]
----
D   404 [rtsx_usb_ms_2]
D  1854 [kworker/0:2]
D  2877 [kworker/0:0]
----
D   404 [rtsx_usb_ms_2]
D  3177 [kworker/u32:4]
----

e a última coisa que fiz:

 % vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  1      0 5010040 123612 1220080    0    0    23    25  111  433  3  1 95  2  0
 0  0      0 5006256 123612 1224164    0    0     0    96  186  839  1  1 97  1  0
 1  0      0 5006132 123612 1224164    0    0     0 0  175  714  1  0 99  0  0
 0  0      0 5003156 123612 1224156    0    0     0 0  234 1009  2  1 98  0  0
 0  0      0 5003156 123612 1224156    0    0     0 0  161  680  0  0 99  0  0
 0  1      0 5003156 123616 1224156    0    0     0    60  214  786  1  1 94  5  0
 0  0      0 5003280 123620 1224156    0    0     0 4  226  776  1  0 88 11  0
 1  0      0 5003156 123620 1224156    0    0     0 0  210  733  1  0 99  0  0
 0  0      0 5005388 123620 1224156    0    0     0 0  159  747  1  0 99  0  0
 0  0      0 5005388 123620 1224156    0    0     0 0  233  803  1  0 99  0  0
 0  0      0 5005512 123620 1224156    0    0     0 0  152  670  1  0 99  0  0
 0  0      0 5009664 123620 1220060    0    0     0 0  240  914  1  1 99  0  0
 0  0      0 5009540 123620 1220060    0    0     0 0  237  833  1  1 99  0  0
 0  0      0 5009664 123620 1220060    0    0     0 0  166  999  1  1 99  0  0
 0  1      0 5009664 123620 1220060    0    0     0 4  168  700  1  0 88 11  0
 0  0      0 5009540 123628 1220060    0    0     0    12  207  778  1  1 91  8  0
 0  0      0 5009788 123628 1220064    0    0     0 0  189  717  0  1 99  0  0
 0  0      0 5009664 123628 1220064    0    0     0 0  243 1453  1  1 98  0  0
 0  0      0 5009044 123628 1220576    0    0     0 0  166  708  1  0 99  0  0
 0  0      0 5009168 123628 1220576    0    0     0 0  146  663  1  0 99  0  0
 0  0      0 5009540 123628 1220064    0    0     0 0  175  705  1  1 99  0  0
 0  1      0 5009292 123632 1220128    0    0     0 8  223  908  1  0 99  0  0
^C

Agora, ainda não sei qual é o problema, mas parece que ele vem de algumas operações de E / S de pico. Existem alguns tópicos de kernel suspeitos. Alguma idéia adicional? O que mais eu poderia fazer para investigar?

edit: O valor de Mem parece estranho, mas ocorreu recentemente, uma semana atrás, tudo parecia normal. E

 % free          
              total        used        free      shared  buff/cache   available
Mem:        8086240     1913860     4824764      133880     1347616     6231856
Swap:       8288252           0     8288252

parece estar bem.

edit2: Primeiros resultados do teste do sar monitorando meu sistema (muito frequentemente, intervalos de 1 segundo, mas por um curto período, para obter os picos):

Linux 4.13.12-1-ARCH (arch)     01.12.2017  _x86_64_    (4 CPU)

12:36:25        CPU     %user     %nice   %system   %iowait    %steal     %idle
12:36:26        all      0.50      0.00      0.50      0.00      0.00     99.00
12:36:27        all      0.50      0.00      0.50      0.25      0.00     98.74
12:36:28        all      0.50      0.00      0.75      0.00      0.00     98.75
12:36:29        all      0.50      0.00      0.25      7.52      0.00     91.73
12:36:30        all      0.25      0.00      0.75      9.77      0.00     89.22
12:36:31        all      0.25      0.00      0.75      0.00      0.00     98.99
12:36:32        all      1.00      0.00      0.50      0.25      0.00     98.25
12:36:33        all      1.00      0.00      1.00      0.00      0.00     98.00
12:36:34        all      0.25      0.00      0.25      0.25      0.00     99.24
12:36:35        all      0.50      0.25      0.75     33.25      0.00     65.25
12:36:36        all      0.50      0.00      0.75      0.25      0.00     98.50
12:36:37        all      0.75      0.00      0.25      0.00      0.00     99.00
12:36:38        all      0.25      0.00      0.50      0.00      0.00     99.24
12:36:39        all      0.50      0.00      0.50      0.00      0.00     99.00
12:36:40        all      0.50      0.25      0.50     10.75      0.00     88.00
Average:        all      0.52      0.03      0.57      4.16      0.00     94.72

Rede ( -n ) parece estar bem. Olhando para dispositivos ( -d ) revela:

Linux 4.13.12-1-ARCH (arch)     01.12.2017  _x86_64_    (4 CPU)

12:36:25          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
12:36:26       dev8-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:36:26       dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
[ . . . ]
12:36:29       dev8-7      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:36:30       dev8-0      2.00      0.00     88.00     44.00      0.41    355.00    207.00     41.40
12:36:30       dev8-1      2.00      0.00     88.00     44.00      0.41    355.00    207.00     41.40
12:36:30       dev8-2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:36:30       dev8-3      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:36:30       dev8-4      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:36:30       dev8-5      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:36:30       dev8-6      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:36:30       dev8-7      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:36:31       dev8-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:36:31       dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
[ . . . ]
12:36:34       dev8-7      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:36:35       dev8-0      2.00      0.00     24.00     12.00      0.70    348.50    348.00     69.60
12:36:35       dev8-1      2.00      0.00     24.00     12.00      0.70    348.50    348.00     69.60
12:36:35       dev8-2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:36:35       dev8-3      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:36:35       dev8-4      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:36:35       dev8-5      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:36:35       dev8-6      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:36:35       dev8-7      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:36:36       dev8-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:36:36       dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
[ . . . ]
12:36:40       dev8-7      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:       dev8-0      0.27      0.00      7.47     28.00      0.12    351.75    455.75     12.15
Average:       dev8-1      0.27      0.00      7.47     28.00      0.12    351.75    455.75     12.15
Average:       dev8-2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:       dev8-3      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:       dev8-4      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:       dev8-5      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:       dev8-6      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:       dev8-7      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

e -b dão:

Linux 4.13.12-1-ARCH (arch)     01.12.2017  _x86_64_    (4 CPU)

12:36:25          tps      rtps      wtps   bread/s   bwrtn/s
12:36:26         0.00      0.00      0.00      0.00      0.00
12:36:27         0.00      0.00      0.00      0.00      0.00
12:36:28         0.00      0.00      0.00      0.00      0.00
12:36:29         0.00      0.00      0.00      0.00      0.00
12:36:30         2.00      0.00      2.00      0.00     88.00
12:36:31         0.00      0.00      0.00      0.00      0.00
12:36:32         0.00      0.00      0.00      0.00      0.00
12:36:33         0.00      0.00      0.00      0.00      0.00
12:36:34         0.00      0.00      0.00      0.00      0.00
12:36:35         2.00      0.00      2.00      0.00     24.00
12:36:36         0.00      0.00      0.00      0.00      0.00
12:36:37         0.00      0.00      0.00      0.00      0.00
12:36:38         0.00      0.00      0.00      0.00      0.00
12:36:39         0.00      0.00      0.00      0.00      0.00
12:36:40         0.00      0.00      0.00      0.00      0.00
Average:         0.27      0.00      0.27      0.00      7.47

Então, presumo que o problema parece estar relacionado ao meu disco rígido (?). Como a E / S está na partição 1 (minha partição raiz), ela deve estar em algum lugar fora de /var , que possui uma partição extra. As outras partições são partições de dados e não relacionadas ao sistema.

edit3: Ainda mais dados para esse pico específico: a paginação parece bem (da minha perspectiva com conhecimento limitado)

Linux 4.13.12-1-ARCH (arch)     01.12.2017  _x86_64_    (4 CPU)

12:36:25     pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
12:36:26         0.00      0.00      0.00      0.00   2233.00      0.00      0.00      0.00      0.00
12:36:27         0.00      0.00      0.00      0.00     88.00      0.00      0.00      0.00      0.00
12:36:28         0.00      0.00    766.00      0.00    185.00      0.00      0.00      0.00      0.00
12:36:29         0.00     40.00      0.00      0.00     47.00      0.00      0.00      0.00      0.00
12:36:30         0.00      4.00      0.00      0.00     45.00      0.00      0.00      0.00      0.00
12:36:31         0.00      0.00      1.00      0.00     46.00      0.00      0.00      0.00      0.00
12:36:32         0.00      0.00      5.00      0.00    560.00      0.00      0.00      0.00      0.00
12:36:33         0.00      0.00      2.00      0.00     85.00      0.00      0.00      0.00      0.00
12:36:34         0.00      0.00      2.00      0.00     47.00      0.00      0.00      0.00      0.00
12:36:35         0.00     12.00      0.00      0.00     44.00      0.00      0.00      0.00      0.00
12:36:36         0.00      0.00      0.00      0.00     47.00      0.00      0.00      0.00      0.00
12:36:37         0.00      0.00      2.00      0.00     45.00      0.00      0.00      0.00      0.00
12:36:38         0.00      0.00      0.00      0.00     47.00      0.00      0.00      0.00      0.00
12:36:39         0.00      0.00      0.00      0.00     77.00      0.00      0.00      0.00      0.00
12:36:40         0.00      8.00      0.00      0.00     47.00      0.00      0.00      0.00      0.00
Average:         0.00      4.27     51.87      0.00    242.87      0.00      0.00      0.00      0.00

Parece que os arquivos foram criados durante esse pico ( -v ):

Linux 4.13.12-1-ARCH (arch)     01.12.2017  _x86_64_    (4 CPU)

12:36:25    dentunusd   file-nr  inode-nr    pty-nr
12:36:26       186520      4480    195468         2
[ . . . ]
12:36:34       186520      4480    195468         2
12:36:35       186520      4512    195468         2
[ . . . ]
12:36:40       186520      4512    195468         2
Average:       186520      4493    195468         2

edit4: Parece que alguns irq são responsáveis. Executar iotop -o -a (mostrar apenas processos com i / o e acumulá-los, portanto, mantenha todos os processos que tinham i / o desde o início do programa) resultou em:

Total DISK READ :       0.00 B/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND     
    7 be/4 root          0.00 B      0.00 B  0.00 % 99.99 % [ksoftirqd/0]
   17 be/4 root          0.00 B      0.00 B  0.00 % 99.99 % [ksoftirqd/1]
   23 be/4 root          0.00 B      0.00 B  0.00 % 99.99 % [ksoftirqd/2]
   29 be/4 root          0.00 B      0.00 B  0.00 % 99.99 % [ksoftirqd/3]
  292 rt/4 root          0.00 B      0.00 B  0.00 % 99.99 % [i915/signal:0]
[ . . . ]

Então, isso é uma coisa? Como eu poderia continuar ...?

    
por nox 30.11.2017 / 15:48

1 resposta

0

Tente as colunas htop (ativar as colunas IO_READ_RATE e IO_WRITE_RATE), iotop e inotifywait.

    
por 01.12.2017 / 20:04