Eu tenho um aplicativo de teste que grava uma string em um arquivo 10000 vezes. Eu corro isso em dois servidores e obtenho resultados totalmente diferentes.
[SYSTEM 1]
$ time ./test_depth
real 0m0.152s
user 0m0.070s
sys 0m0.082s
$ strace -cT ./test_depth
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
57.38 0.000420 0 10000 write
30.19 0.000221 0 9924 fadvise64
12.43 0.000091 0 10000 gettid
0.00 0.000000 0 11 read
0.00 0.000000 0 66 54 open
0.00 0.000000 0 11 close
0.00 0.000000 0 7 6 stat
0.00 0.000000 0 13 fstat
0.00 0.000000 0 2 lseek
0.00 0.000000 0 32 mmap
0.00 0.000000 0 15 mprotect
0.00 0.000000 0 2 munmap
0.00 0.000000 0 3 brk
0.00 0.000000 0 2 rt_sigaction
0.00 0.000000 0 1 rt_sigprocmask
0.00 0.000000 0 1 1 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 2 uname
0.00 0.000000 0 3 fcntl
0.00 0.000000 0 1 1 unlink
0.00 0.000000 0 1 symlink
0.00 0.000000 0 1 getrlimit
0.00 0.000000 0 1 arch_prctl
0.00 0.000000 0 3 1 futex
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.000732 30105 63 total
[SYSTEM 2]
$ time ./test_depth
real 1m0.252s
user 0m0.163s
sys 0m0.307s
$ strace -cT ./test_depth
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
98.29 0.147977 15 10000 write
1.56 0.002355 0 9925 fadvise64
0.14 0.000217 0 10000 gettid
0.00 0.000006 1 11 close
0.00 0.000000 0 11 read
0.00 0.000000 0 57 45 open
0.00 0.000000 0 3 3 stat
0.00 0.000000 0 13 fstat
0.00 0.000000 0 2 lseek
0.00 0.000000 0 32 mmap
0.00 0.000000 0 15 mprotect
0.00 0.000000 0 2 munmap
0.00 0.000000 0 3 brk
0.00 0.000000 0 2 rt_sigaction
0.00 0.000000 0 1 rt_sigprocmask
0.00 0.000000 0 1 1 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 2 uname
0.00 0.000000 0 3 fcntl
0.00 0.000000 0 1 unlink
0.00 0.000000 0 1 symlink
0.00 0.000000 0 1 getrlimit
0.00 0.000000 0 1 arch_prctl
0.00 0.000000 0 3 1 futex
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.150555 30093 50 total
O processo é executado por cerca de um minuto no segundo sistema. E olhando para o strace, ele gasta 15 micro segundos em cada chamada write
. Como continuo diagnosticando o problema daqui? Parece que a gravação do disco está demorando mais no segundo sistema. Como posso ter certeza se é realmente isso? E o que mais eu deveria estar olhando?
As montagens que estão sendo usadas nos dois casos:
1) - /dev/sda1 on / type ext4 (rw)
2) - /dev/mapper/VolGroup-lv_home on /home type ext4 (rw)
A propósito, os logs estão sendo escritos usando glog
lib. Se o printf
simples for usado, esse problema não aparecerá. Então, parece um problema com a biblioteca. : (
[root@localhost src]# hdparm -I /dev/sda
/dev/sda:
SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0d 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ATA device, with non-removable media
Standards:
Likely used: 1
Configuration:
Logical max current
cylinders 0 0
heads 0 0
sectors/track 0 0
--
Logical/Physical Sector size: 512 bytes
device size with M = 1024*1024: 0 MBytes
device size with M = 1000*1000: 0 MBytes
cache/buffer size = unknown
Capabilities:
IORDY not likely
Cannot perform double-word IO
R/W multiple sector transfer: not supported
DMA: not supported
PIO: pio0
[root@localhost src]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup-lv_root
51606140 6267948 42716752 13% /
tmpfs 8141564 1016 8140548 1% /dev/shm
/dev/sda3 495844 37728 432516 9% /boot
/dev/mapper/VolGroup-lv_home
1090839440 1022652 1034405284 1% /home
[root@localhost src]# ./lshw -class disk
*-disk
description: SCSI Disk
product: PERC H310
vendor: DELL
physical id: 2.0.0
bus info: scsi@0:2.0.0
logical name: /dev/sda
version: 2.12
serial: 00677e1515fc98ef19000d93f7a0a38c
size: 1117GiB (1199GB)
capabilities: partitioned partitioned:dos
configuration: ansiversion=5 logicalsectorsize=512 sectorsize=512 signature=31c27a66
SOs:
[SYSTEM 1]
Linux snapoffice 2.6.32-71.el6.x86_64 #1 SMP Fri May 20 03:51:51 BST 2011 x86_64 x86_64 x86_64 GNU/Linux
[SYSTEM 2]
Linux localhost.localdomain 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Tags performance linux