Portanto, configuramos um servidor ( 11.0-RELEASE-p2 ) que hospeda cerca de 150 a 200 cadeias. O servidor possui 24 núcleos e 192gb de ram. Ao usar o topo, não mostra nenhum sinal de estresse - exceto a alta carga. Todas as cadeias residem em montagens NFS e cada cadeia monta seu próprio diretório na criação.
O servidor não se sente lento de alguma forma, é bastante irritado. A única coisa que nos incomoda é a carga alta que recebemos.
Saída da parte superior:
last pid: 71841; load averages: 320.13, 131.33, 79.28 up 27+17:45:03 10:37:48
5325 processes:1 running, 5324 sleeping
CPU: 4.4% user, 0.0% nice, 1.6% system, 0.4% interrupt, 93.6% idle
Mem: 3116M Active, 23G Inact, 23G Wired, 900M Buf, 138G Free
ARC: 10G Total, 2612M MFU, 4553M MRU, 37M Anon, 89M Header, 2742M Other
Swap: 4096M Total, 4096M Free
Como você pode ver, a carga é alta, a memória tem 138G livre e o cpu está 94% ocioso.
Saída do systat -vmstat
3 users Load 92.59 105 73.97 Feb 1 10:39
Mem usage: 26%Phy 6%Kmem
Mem: KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 21491k 223884 120800k 555864 144668k count
All 22230k 836948 142997k 4351592 pages
Proc: Interrupts
r p d s w Csw Trp Sys Int Sof Flt ioflt 3595 total
104 5k 13k 5848 20k 1362 127 1646 147 cow atkbd0 1
730 zfod 1 ata1 15
1.8%Sys 0.3%Intr 3.0%User 0.0%Nice 94.9%Idle ozfod ohci0 ohci
| | | | | | | | | | %ozfod ehci0 ohci
=>> daefr 107 cpu0:timer
dtbuf 622 prcfr 722 bce0 259
Namei Name-cache Dir-cache 3237762 desvn 2014 totfr 619 bce1 260
Calls hits % hits % 3237760 numvn react pcib7 263
41265 41201 100 2713450 frevn pdwak 21 mps0 264
1290 pdpgs ciss0 265
Disks da0 da1 cd0 pass0 pass1 pass2 intrn 74 cpu13:time
KB/t 13.33 14.76 0.00 0.00 0.00 0.00 24315624 wire 112 cpu4:timer
tps 10 17 0 0 0 0 3192008 act 147 cpu2:timer
MB/s 0.14 0.24 0.00 0.00 0.00 0.00 23921440 inact 54 cpu3:timer
%busy 0 0 0 0 0 0 cache 132 cpu5:timer
144669k free 52 cpu1:timer
921954 68 cpu19:time
99 cpu21:time
54 cpu20:time
59 cpu18:time
59 cpu22:time
82 cpu23:time
67 cpu12:time
68 cpu6:timer
79 cpu14:time
88 cpu15:time
111 cpu16:time
93 cpu17:time
49 cpu8:timer
251 cpu7:timer
102 cpu9:timer
176 cpu10:time
49 cpu11:time
Tanto quanto eu posso dizer nada parece muito estranho lá também. Claro, existem algumas interrupções, mas o googling mostra que as interrupções na quantidade que obtemos não são nada comparadas com o que as outras pessoas recebem quando têm problemas de interrupção que estão mais na linha de 350.000 interrupções.
iostat -w 1
tty da0 da1 cd0 cpu
tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id
1 571 14.51 11 0.15 14.56 11 0.15 0.00 0 0.00 1 0 1 0 99
0 231 10.29 90 0.90 11.26 102 1.12 0.00 0 0.00 3 0 1 0 95
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 3 0 1 0 96
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 7 0 1 0 92
0 79 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 3 0 2 0 95
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 6 0 2 0 93
0 77 13.63 128 1.71 11.97 123 1.44 0.00 0 0.00 2 0 2 0 96
0 79 36.00 1 0.04 14.86 7 0.10 0.00 0 0.00 2 0 1 0 97
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94
0 76 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94
0 80 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 2 0 1 0 97
0 75 9.98 117 1.15 18.43 129 2.32 0.00 0 0.00 3 0 1 0 96
0 81 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 4 0 2 0 94
0 78 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 2 0 1 0 96
vmstat -w 1
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id
3 0 0 115G 138G 297 0 2 0 653 373 0 0 224 59 1405 1 1 99
2 0 0 115G 138G 75 0 0 0 2017 1368 118 109 2299 23370 18920 6 2 92
2 0 0 115G 138G 1397 0 2 0 2839 1434 0 0 2665 30985 23294 5 4 91
2 0 0 115G 138G 1113 0 0 0 666 1373 0 0 2222 23078 17157 5 2 93
1 0 0 115G 138G 7 0 0 0 597 1368 0 0 590 18529 10477 2 1 96
1 0 0 115G 138G 0 0 2 0 194 2773 83 81 1269 26734 19190 3 3 94
1 0 0 115G 138G 9 0 0 0 90 1404 0 0 833 18907 11455 2 2 96
2 0 0 115G 138G 13 0 0 0 1309 1374 0 0 3185 25773 20054 3 3 94
1 0 0 115G 138G 1419 0 0 0 2750 1369 0 0 3899 25403 23252 7 4 90
0 0 0 115G 138G 776 0 1 0 164 1368 75 58 837 26261 16368 3 3 94
1 0 0 115G 138G 2336 0 5 0 2562 1367 0 0 1337 23287 13288 3 3 94
0 0 0 115G 138G 560 0 0 0 1193 2785 0 0 608 27176 14512 5 5 90
1 0 0 115G 138G 0 0 2 0 249 1369 0 0 702 18533 10700 1 2 97
1 0 0 115G 138G 3290 0 0 0 2313 1369 91 96 1461 22049 14726 6 3 91
Sobre o NFS eu realmente não sei como procurar problemas lá. Mas aqui está uma saída de
nfsstat -c
Client Info:
Rpc Counts:
Getattr Setattr Lookup Readlink Read Write Create Remove
44956931 1020943 93567574 167 23609403 879028 514647 665228
Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access
36867 1387 1 24655 21955 6118822 0 26166205
Mknod Fsstat Fsinfo PathConf Commit
0 5489407 1 2270 830867
Rpc Info:
TimedOut Invalid X Replies Retries Requests
0 0 0 0 203906224
Cache Info:
Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits Misses
-719986429 44956925 -1243965171 93531884 66678251 22460288 981123 879028
BioRLHits Misses BioD Hits Misses DirE Hits Misses Accs Hits Misses
144 167 14572148 5721030 5124486 1455 -1123294109 26165764
e de
nfsstat -w 1 -c
GtAttr Lookup Rdlink Read Write Rename Access Rddir
5 0 0 5 0 0 0 2
9 342 0 9 0 0 42 9
12 91 0 21 0 0 21 4
0 2 0 0 0 0 2 0
0 1 0 0 0 0 0 0
0 5 0 0 0 0 2 0
5 124 0 5 0 0 0 2
6 12 0 5 0 0 12 2
4 0 0 5 0 0 0 2
9 0 0 10 0 0 0 4
4 0 0 5 0 0 0 2
50 1 0 14 0 0 0 7
e finalmente saída de
systat -ifstat
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 29.6
Interface Traffic Peak Total
lo0 in 34.285 KB/s 291.936 KB/s 69.263 GB
out 34.285 KB/s 291.936 KB/s 69.263 GB
bce1 in 792.808 KB/s 5.382 MB/s 707.266 GB
out 56.828 KB/s 238.912 KB/s 91.154 GB
bce0 in 21.711 KB/s 21.711 KB/s 17.338 GB
out 13.799 KB/s 287.402 KB/s 64.000 GB
Como solicitado dmesg:
[larsemil@prison01 ~]$ dmesg
Limiting open port RST response from 213 to 200 packets/sec
Limiting open port RST response from 2636 to 200 packets/sec
pid 22548 (php-fpm), uid 10000: exited on signal 11
pid 26938 (wkhtmltopdf), uid 10000: exited on signal 6 (core dumped)
[zone: pf states] PF states limit reached
Limiting icmp ping response from 9592 to 200 packets/sec
Limiting icmp ping response from 611 to 200 packets/sec
Limiting icmp ping response from 1792 to 200 packets/sec
Limiting icmp ping response from 2650 to 200 packets/sec
Limiting icmp ping response from 316 to 200 packets/sec
Limiting icmp ping response from 1758 to 200 packets/sec
Limiting icmp ping response from 2478 to 200 packets/sec
Limiting icmp ping response from 578 to 200 packets/sec
Limiting icmp ping response from 2028 to 200 packets/sec
Limiting icmp ping response from 3175 to 200 packets/sec
Limiting icmp ping response from 245 to 200 packets/sec
Limiting icmp ping response from 536 to 200 packets/sec
Limiting icmp ping response from 229 to 200 packets/sec
Limiting icmp ping response from 546 to 200 packets/sec
Limiting icmp ping response from 2239 to 200 packets/sec
Limiting icmp ping response from 3414 to 200 packets/sec
Limiting icmp ping response from 3033 to 200 packets/sec
Limiting icmp ping response from 1018 to 200 packets/sec
Limiting icmp ping response from 270 to 200 packets/sec
pid 34239 (php-fpm), uid 10000: exited on signal 11
pid 68427 (php-fpm), uid 10000: exited on signal 11
Qualquer ideia é bem vinda!