Problema com a utilização da CPU do processo do swapper

0

Estamos com um problema no processo do permutador, levando 40% da utilização da CPU.

Este é um servidor HP DL360G8 com 16 núcleos hyperthreaded para 32 vCPU com o Ubuntu 16.04.

uname -a
Linux ubuntu 4.4.0-87-generic #110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Estamos executando 10 instâncias do Chrome em paralelo. Média overal A utilização da CPU pelo Chrome é de aproximadamente 20%. Mas a caixa é utilizada e 50-60%.

resultado do comando top:

top - 12:28:56 up 18:22,  1 user,  load average: 26.06, 25.48, 26.25
Tasks: 531 total,  14 running, 517 sleeping,   0 stopped,   0 zombie
%Cpu(s): 51.8 us, 16.7 sy,  0.0 ni, 30.9 id,  0.0 wa,  0.0 hi,  0.6 si,  0.0 st
KiB Mem : 32903968 total,  5360712 free,  4363668 used, 23179588 buff/cache
KiB Swap: 33521660 total, 33521660 free,        0 used. 27271240 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
16450 ubuntu     20   0 1272700 371316  90316 R 128.5  1.1   0:31.28 chrome
20228 ubuntu     20   0 1035308 231700 112168 R 118.0  0.7   0:08.68 chrome
17929 ubuntu     20   0 1168144 300908  78488 R 110.8  0.9   0:19.71 chrome
20236 ubuntu     20   0  976584 181224  71084 R 107.9  0.6   0:08.52 chrome
17364 ubuntu     20   0 1094608 222588  86896 R 104.3  0.7   0:14.03 chrome
18048 ubuntu     20   0  876428 153676 103216 R  81.6  0.5   0:07.96 chrome
18917 ubuntu     20   0  906296 111216  58764 R  77.0  0.3   0:06.04 chrome
17178 ubuntu     20   0  950044 124616  57396 R  69.8  0.4   0:08.81 chrome
20231 ubuntu     20   0  975728 155644  80336 R  62.6  0.5   0:05.00 chrome
16861 ubuntu     20   0  790176 143856  94232 R  62.3  0.4   0:13.35 chrome
18247 ubuntu     20   0  789240 144924  97188 R  60.0  0.4   0:05.93 chrome
19052 ubuntu     20   0  876516  94588  57136 R  59.0  0.3   0:03.85 chrome
20816 ubuntu     20   0  862732 119076  83696 S  52.5  0.4   0:01.68 chrome
20845 ubuntu     20   0  765556 116208  89024 S  47.5  0.4   0:01.45 chrome
20881 ubuntu     20   0  767336 116076  88860 S  47.5  0.4   0:01.45 chrome
15032 ubuntu     20   0  833228 154676 100316 S  40.3  0.5   0:18.29 chrome
19242 ubuntu     20   0  807540 141460  93840 S  40.0  0.4   0:07.46 chrome
16419 ubuntu     20   0  802776 151296  98852 S  34.4  0.5   0:12.39 chrome
19746 ubuntu     20   0  802080 143508  96040 S  34.4  0.4   0:03.45 chrome
19563 ubuntu     20   0  866784 102160  57740 S  32.8  0.3   0:04.11 chrome
15606 ubuntu     20   0  916200 134464  58336 S  28.9  0.4   0:11.71 chrome
16747 ubuntu     20   0  935936  91460  58256 S  22.3  0.3   0:07.39 chrome
21507 ubuntu     20   0  809420  77648  54356 S  21.3  0.2   0:00.65 chrome
45917 root      20   0       0      0      0 S  21.3  0.0   0:19.54 kworker/7:4
24222 root      20   0       0      0      0 S  20.0  0.0   2:32.20 kworker/11:3
21235 ubuntu     20   0  807404  77448  54220 S  19.7  0.2   0:00.60 chrome
52972 root      20   0       0      0      0 S  19.0  0.0   1:07.90 kworker/23:1
37232 root      20   0       0      0      0 S  18.7  0.0   1:01.20 kworker/3:2
48449 root      20   0       0      0      0 S  18.4  0.0   0:05.88 kworker/1:4
 6863 root      20   0       0      0      0 S  17.7  0.0   0:39.47 kworker/9:1
26492 root      20   0       0      0      0 S  17.7  0.0   1:17.70 kworker/16:1

perf:

sudo perf record -g -a sleep 10

-   33.36%     0.00%  swapper          [kernel.kallsyms]           [k] cpu_startup_entry                                                                                          ▒
   - 33.36% cpu_startup_entry                                                                                                                                                     ▒
      - 31.16% call_cpuidle                                                                                                                                                       ▒
         - 31.16% cpuidle_enter                                                                                                                                                   ▒
            - 23.98% cpuidle_enter_state                                                                                                                                          ▒
                 23.51% intel_idle                                                                                                                                                ▒
                 0.00% leave_mm                                                                                                                                                   ▒
                 0.00% poll_idle                                                                                                                                                  ▒
               + 0.00% ktime_get                                                                                                                                                  ▒
                 0.00% sched_idle_set_state                                                                                                                                       ▒
                 0.00% read_tsc                                                                                                                                                   ▒
            + 6.02% apic_timer_interrupt                                                                                                                                          ▒
            + 1.16% reschedule_interrupt                                                                                                                                          ▒
            + 0.00% ret_from_intr                                                                                                                                                 ▒
            + 0.00% call_function_interrupt                                                                                                                                       ▒
              0.00% ktime_get                                                                                                                                                     ▒
              0.00% intel_idle                                                                                                                                                    ▒
              0.00% sched_idle_set_state                                                                                                                                          ▒
              0.00% native_irq_return_iret                                                                                                                                        ▒
              0.00% restore_c_regs_and_iret                                                                                                                                       ▒
              0.00% retint_kernel                                                                                                                                                 ▒
              0.00% common_interrupt                                                                                                                                              ▒
            + 0.00% call_function_single_interrupt                                                                                                                                ▒
              0.00% native_iret                                                                                                                                                   ▒
           0.00% cpuidle_enter_state                                                                                                                                              ▒
      + 1.06% schedule_preempt_disabled                                                                                                                                           ▒
        0.61% cpuidle_not_available                                                                                                                                               ▒
      + 0.53% cpuidle_select                                                                                                                                                      ▒
      + 0.00% tick_nohz_idle_enter                                                                                                                                                ▒
      + 0.00% sched_ttwu_pending                                                                                                                                                  ▒
      + 0.00% tick_nohz_idle_exit                                                                                                                                                 ▒
        0.00% rcu_idle_enter                                                                                                                                                      ▒
      + 0.00% arch_cpu_idle_enter                                                                                                                                                 ▒
        0.00% rcu_idle_exit                                                                                                                                                       ▒
        0.00% schedule                                                                                                                                                            ▒

Os backtraces da CPU para todos os núcleos estão mostrando o mesmo (além dos processos do Chrome):

processos de swapper

[63714.313210] NMI backtrace for cpu 29
[63714.313221] CPU: 29 PID: 0 Comm: swapper/29 Not tainted 4.4.0-87-generic #110-Ubuntu
[63714.313227] Hardware name: HP ProLiant DL360p Gen8, BIOS P71 07/01/2015
[63714.313232] task: ffff88082d9aaa00 ti: ffff88082d9c4000 task.ti: ffff88082d9c4000
[63714.313239] RIP: 0010:[<ffffffff8148bb98>]  [<ffffffff8148bb98>] intel_idle+0xa8/0x130
[63714.313250] RSP: 0018:ffff88082d9c7e40  EFLAGS: 00000046
[63714.313257] RAX: 0000000000000030 RBX: 0000000000000010 RCX: 0000000000000001
[63714.313262] RDX: 0000000000000000 RSI: ffff88082d9c8000 RDI: 0000000001e0a000
[63714.313268] RBP: ffff88082d9c7e60 R08: 0000000000000981 R09: 0000000000000018
[63714.313273] R10: 0000000000000022 R11: 000000000000103a R12: 0000000000000004
[63714.313286] R13: 0000000000000005 R14: 0000000000000030 R15: ffffffff81eb3658
[63714.313292] FS:  0000000000000000(0000) GS:ffff88083f740000(0000) knlGS:0000000000000000
[63714.313298] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[63714.313307] CR2: 000019aa9ecccee8 CR3: 0000000001e0a000 CR4: 00000000000406e0
[63714.313317] Stack:
[63714.313320]  0000000000000005 ffffffff81eb3460 ffffe8ffff940600 000039f31051e675
[63714.313322]  ffff88082d9c7ea8 ffffffff816d4e67 000000003f753bc0 ffffffff81eb3460
[63714.313324]  ffffffff81f38d00 ffff88082d9c8000 ffffe8ffff940600 ffffffff81eb3460
[63714.313325] Call Trace:
[63714.313326]  [<ffffffff816d4e67>] cpuidle_enter_state+0xe7/0x2b0
[63714.313327]  [<ffffffff816d5067>] cpuidle_enter+0x17/0x20
[63714.313329]  [<ffffffff810c46d2>] call_cpuidle+0x32/0x60
[63714.313330]  [<ffffffff816d5043>] ? cpuidle_select+0x13/0x20
[63714.313332]  [<ffffffff810c4990>] cpu_startup_entry+0x290/0x350
[63714.313333]  [<ffffffff810517b4>] start_secondary+0x154/0x190
[63714.313335] Code: 48 8b 34 25 c4 42 01 00 48 89 d1 48 8d 86 08 c0 ff ff 0f 01 c8 48 8b 86 08 c0 ff ff a8 08 75 0b b9 01 00 00 00 4c 89 f0 0f 01 c9 <65> 48 8b 04 25 c4 42 01 00 f0 80 a0 0a c0 ff ff df 0f ae f0 48

processos de kworker

[63548.059703] CPU: 31 PID: 22556 Comm: kworker/31:0 Not tainted 4.4.0-87-generic #110-Ubuntu
[63548.059705] Hardware name: HP ProLiant DL360p Gen8, BIOS P71 07/01/2015
[63548.059707] task: ffff88079b7d5400 ti: ffff8807fa150000 task.ti: ffff8807fa150000
[63548.059709] RIP: 0010:[<ffffffff8183d72a>]  [<ffffffff8183d72a>] __schedule+0x3ea/0xa30
[63548.059711] RSP: 0000:ffff8807fa153df8  EFLAGS: 00000003
[63548.059713] RAX: 000000000000001f RBX: ffff88083f7d6dc0 RCX: 0000000000000000
[63548.059715] RDX: ffffffff81f38d00 RSI: 0000000000000000 RDI: 0000000000000000
[63548.059717] RBP: ffff8807fa153e38 R08: 0000000000000000 R09: 0000000000000000
[63548.059721] R10: 00000000000002eb R11: 0000000000000000 R12: 0000000000000000
[63548.059722] R13: 0000000000016dc0 R14: ffff88079b7d5970 R15: ffff88083f7d6dc0
[63548.059724] FS:  0000000000000000(0000) GS:ffff88083f7c0000(0000) knlGS:0000000000000000
[63548.059726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[63548.059728] CR2: 00000000000000b8 CR3: 00000007f5661000 CR4: 00000000000406e0
[63548.059730] Stack:
[63548.059732]  ffff88083f7daf00 ffff8807e622b800 ffff88079b7d5400 ffff8807fa154000
[63548.059734]  ffff88083f7d65c0 0000000000000008 ffff88083f7d65d8 ffff88082c6d7200
[63548.059736]  ffff8807fa153e50 ffffffff8183dda5 ffff88083f7d65c0 ffff8807fa153eb8
[63548.059738] Call Trace:
[63548.059740]  [<ffffffff8183dda5>] schedule+0x35/0x80
[63548.059742]  [<ffffffff8109a9cb>] worker_thread+0xcb/0x4c0
[63548.059744]  [<ffffffff8109a900>] ? process_one_work+0x480/0x480
[63548.059747]  [<ffffffff810a0c85>] kthread+0xe5/0x100
[63548.059748]  [<ffffffff810a0ba0>] ? kthread_create_on_node+0x1e0/0x1e0
[63548.059750]  [<ffffffff8184224f>] ret_from_fork+0x3f/0x70
[63548.059758]  [<ffffffff810a0ba0>] ? kthread_create_on_node+0x1e0/0x1e0
[63548.059767] Code: 00 00 0f 85 8e 03 00 00 48 83 c4 18 5b 41 5c 41 5d 41 5e 41 5f 5d c3 65 8b 05 c3 ca 7c 7e 48 8b 15 44 47 1d 00 89 c0 48 0f a3 02 <19> c0 85 c0 0f 84 c9 fd ff ff 4c 8b 2d d5 41 6d 00 4d 85 ed 74

Olhando para interrupções acpi

grep . -r /sys/firmware/acpi/interrupts/

também não revelou nada de suspeito.

Estou sem ideias que podem causar esse comportamento, por isso, qualquer ajuda é apreciada.

Obrigado antecipadamente!

    
por Łukasz Koniecki 22.12.2017 / 13:55

0 respostas