Eu tenho um servidor Ubuntu 11.04 customizado com uma unidade primária RAID 10 de 6 discos. Nele eu estou executando principalmente um PostgreSQL e alguns outros utilitários que transmitem dados da web. Muitas vezes, depois de algumas horas de funcionamento, o servidor começa a ficar lento com todos os tipos de processos. Por exemplo, pode levar de 10 a 15 segundos após o login para obter um prompt de shell. Pode levar de 5 a 10 segundos para que top
apareça. Um ls
pode demorar um ou dois segundos.
Quando olho para o topo, quase não há uso da CPU. Há uma boa quantidade de memória usada pelo servidor PostgreSQL, mas não o suficiente para vazar em swap.
Eu não tenho idéia de onde ir a partir daqui, além de suspeitar do RAID10 (eu só tive antes o software RAID 1).
Editar: saída da parte superior:
top - 11:56:03 up 1:46, 3 users, load average: 0.89, 0.73, 0.72
Tasks: 119 total, 1 running, 118 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 0.0%sy, 0.0%ni, 93.5%id, 6.2%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16325596k total, 3478248k used, 12847348k free, 20880k buffers
Swap: 19534176k total, 0k used, 19534176k free, 3041992k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1747 woodsp 20 0 109m 10m 4888 S 1 0.1 0:42.70 python
357 root 20 0 0 0 0 S 0 0.0 0:00.40 jbd2/sda3-8
1 root 20 0 24324 2284 1344 S 0 0.0 0:00.84 init
2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0 0.0 0:00.24 ksoftirqd/0
6 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/0
7 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/0
8 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/1
10 root 20 0 0 0 0 S 0 0.0 0:00.02 ksoftirqd/1
12 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/1
13 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/2
14 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/2:0
15 root 20 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/2
16 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/2
17 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/3
18 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/3:0
19 root 20 0 0 0 0 S 0 0.0 0:00.02 ksoftirqd/3
20 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/3
21 root 0 -20 0 0 0 S 0 0.0 0:00.00 cpuset
22 root 0 -20 0 0 0 S 0 0.0 0:00.00 khelper
23 root 20 0 0 0 0 S 0 0.0 0:00.00 kdevtmpfs
24 root 0 -20 0 0 0 S 0 0.0 0:00.00 netns
26 root 20 0 0 0 0 S 0 0.0 0:00.00 sync_supers
df -h
rpsharp@ncp-skookum:~$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 1.8T 549G 1.2T 32% /
udev 7.8G 4.0K 7.8G 1% /dev
tmpfs 3.2G 492K 3.2G 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 7.8G 0 7.8G 0% /run/shm
/dev/sda2 952M 128K 952M 1% /boot/efi
/dev/md0 5.5T 562G 4.7T 11% /usr/local
free -m
psharp@ncp-skookum:~$ free -m
total used free shared buffers cached
Mem: 15942 3409 12533 0 20 2983
-/+ buffers/cache: 405 15537
Swap: 19076 0 19076
tail -50 / var / log / syslog
Jul 3 06:31:32 ncp-skookum rsyslogd: [origin software="rsyslogd" swVersion="5.8.6" x-pid="1070" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Jul 3 06:39:01 ncp-skookum CRON[14211]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete)
Jul 3 06:40:01 ncp-skookum CRON[14223]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp)
Jul 3 07:00:01 ncp-skookum CRON[14328]: (woodsp) CMD (/home/woodsp/bin/mail_tweetupdate # email an update)
Jul 3 07:00:01 ncp-skookum CRON[14327]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp)
Jul 3 07:00:28 ncp-skookum sendmail[14356]: q63E0SoZ014356: from=woodsp, size=2328, class=0, nrcpts=2, msgid=<[email protected]>, relay=woodsp@localhost
Jul 3 07:00:29 ncp-skookum sm-mta[14357]: q63E0Si6014357: from=<[email protected]>, size=2569, class=0, nrcpts=2, msgid=<[email protected]>, proto=ESMTP, daemon=MTA-v4, relay=localhost [127.0.0.1]
Jul 3 07:00:29 ncp-skookum sendmail[14356]: q63E0SoZ014356: to=Spencer Wood <[email protected]>,Martin Lacayo <[email protected]>, ctladdr=woodsp (1004/1005), delay=00:00:01, xdelay=00:00:01, mailer=relay, pri=62328, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (q63E0Si6014357 Message accepted for delivery)
Jul 3 07:00:29 ncp-skookum sm-mta[14359]: STARTTLS=client, relay=mx3.stanford.edu., version=TLSv1/SSLv3, verify=FAIL, cipher=DHE-RSA-AES256-SHA, bits=256/256
Jul 3 07:00:29 ncp-skookum sm-mta[14359]: q63E0Si6014357: to=<[email protected]>,<[email protected]>, ctladdr=<[email protected]> (1004/1005), delay=00:00:01, xdelay=00:00:00, mailer=esmtp, pri=152569, relay=mx3.stanford.edu. [171.67.219.73], dsn=2.0.0, stat=Sent (Ok: queued as 8F3505802AC)
Jul 3 07:09:08 ncp-skookum CRON[14396]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete)
Jul 3 07:17:01 ncp-skookum CRON[14438]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jul 3 07:20:01 ncp-skookum CRON[14453]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp)
Jul 3 07:39:01 ncp-skookum CRON[14551]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete)
Jul 3 07:40:01 ncp-skookum CRON[14562]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp)
Jul 3 08:00:01 ncp-skookum CRON[14668]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp)
Jul 3 08:09:01 ncp-skookum CRON[14724]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete)
Jul 3 08:17:01 ncp-skookum CRON[14766]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jul 3 08:20:01 ncp-skookum CRON[14781]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp)
Jul 3 08:39:01 ncp-skookum CRON[14881]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete)
Jul 3 08:40:01 ncp-skookum CRON[14892]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp)
Saída do hdparm -t / dev / sd {a, b, c, d, e, f} Isso parece suspeito?
/dev/sda:
Timing buffered disk reads: 2 MB in 4.84 seconds = 423.39 kB/sec
/dev/sdb:
Timing buffered disk reads: 420 MB in 3.01 seconds = 139.74 MB/sec
/dev/sdc:
Timing buffered disk reads: 390 MB in 3.00 seconds = 129.87 MB/sec
/dev/sdd:
Timing buffered disk reads: 416 MB in 3.00 seconds = 138.51 MB/sec
/dev/sde:
Timing buffered disk reads: 422 MB in 3.00 seconds = 140.50 MB/sec
/dev/sdf:
Timing buffered disk reads: 416 MB in 3.01 seconds = 138.26 MB/sec