Eu tenho um sistema (grande) que tem um comportamento de cache que eu não entendo. Parece que o cache deixou o sistema sem memória. Como isso é possível? O kernel não limpa o cache quando há pressão de memória?
Um de nossos usuários executou um 'cp *' entre dois pontos de montagem. Havia muitos arquivos grandes nesse ponto de montagem. Dentro de 20 minutos do início do cp, a memória livre caiu de 190G para 3G. O que é bom, o kernel estava usando o cache. No entanto, após várias horas de trabalho, começou a haver page_allocation_errors e o sistema experimentou um pânico no kernel. O cache não deveria ter sido limpo? O que estou perdendo?
Saída Sar:
04:30:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
04:30:01 AM 190117832 338882216 64.06 821904 98713680 88991700 16.06
04:40:01 AM 94106312 434893736 82.21 822088 194964388 89050160 16.07
04:50:01 AM 3014152 525985896 99.43 821300 286266580 87867416 15.86
05:00:01 AM 4209852 524790196 99.20 801136 286031744 88871652 16.04
05:10:01 AM 4714172 524285876 99.11 801344 287214104 88159636 15.91
05:20:01 AM 5400216 523599832 98.98 805096 286247064 90627168 16.35
05:30:01 AM 5511400 523488648 98.96 805416 287312192 88569240 15.98
05:40:01 AM 5650476 523349572 98.93 805236 287706616 88528168 15.98
05:50:01 AM 5691756 523308292 98.92 805700 288392120 88720760 16.01
06:00:01 AM 5445876 523554172 98.97 805884 288012648 88708384 16.01
06:10:01 AM 5509688 523490360 98.96 805752 288311060 88776544 16.02
06:20:01 AM 5403656 523596392 98.98 805992 288453688 88767020 16.02
06:30:01 AM 5439672 523560376 98.97 806224 289253584 87783180 15.84
06:40:01 AM 5454604 523545444 98.97 806540 289048948 88136728 15.90
06:50:01 AM 5542920 523457128 98.95 806900 289134376 87709596 15.83
07:00:01 AM 6186992 522813056 98.83 807312 288270024 88280072 15.93
07:10:02 AM 6160340 522839708 98.84 807568 288291456 88446384 15.96
07:20:01 AM 6178224 522821824 98.83 807832 288347544 88303360 15.93
07:30:01 AM 5910668 523089380 98.88 808076 288161444 88938156 16.05
07:40:01 AM 5986916 523013132 98.87 808332 287596868 89381608 16.13
07:50:01 AM 5987924 523012124 98.87 808524 287766636 88892100 16.04
08:00:01 AM 6176552 522823496 98.83 808628 286743392 90363500 16.31
08:10:01 AM 6307184 522692864 98.81 808780 286462580 90791068 16.38
08:20:01 AM 6016008 522984040 98.86 808972 286270564 91520944 16.52
08:30:01 AM 6066140 522933908 98.85 809212 285822480 92202952 16.64
08:40:01 AM 6122212 522877836 98.84 809340 285830432 92116260 16.62
08:50:01 AM 6215360 522784688 98.83 809420 285934580 91682044 16.54
09:00:01 AM 6144172 522855876 98.84 809504 285484836 92590744 16.71
09:10:01 AM 6216228 522783820 98.82 809612 285656692 92262828 16.65
09:20:01 AM 6098812 522901236 98.85 809968 286012856 91611356 16.53
09:30:01 AM 6082072 522917976 98.85 811316 285738912 92147848 16.63
09:40:01 AM 6026820 522973228 98.86 811232 285632132 92670104 16.72
09:50:01 AM 5993500 523006548 98.87 810980 285557268 92356968 16.67
10:00:01 AM 6103388 522896660 98.85 811184 284710840 93299232 16.84
10:10:01 AM 6224652 522775396 98.82 811448 284692440 93302020 16.84
10:20:01 AM 6206720 522793328 98.83 811560 285226000 92498224 16.69
10:30:01 AM 5946256 523053792 98.88 823404 285104536 93159260 16.81
Average: 85971238 443028810 83.75 812425 204755681 90070609 16.25
10:32:29 AM LINUX RESTART