Estou tendo alguns problemas de desempenho com meus KVMs do QEMU no meu cluster do Ceph. O cluster tem 4 nós com drives 4x1TB cada, 48 / 64GB de RAM, Intel Xeon e AMD Opterons. Eles são interconectados por interfaces 3x1 GBit configuradas como uma interface de ligação. O tráfego geral da rede agora está muito alto. De tempos em tempos, há OIs bloqueados e não sei exatamente por quê. Os hosts OSD e KVM são equipados com o Ubuntu 14.04 LTS e o Kernel 3.13.0. Existe um interruptor que eu esqueci de virar? Talvez você possa me ajudar com isso porque eu estou no fim da minha cabeça.
Um snippet do log com IOs bloqueados:
2015-11-10 08:03:52.597054 mon.0 10.14.0.6:6789/0 546966 : cluster [INF] HEALTH_WARN; 1 requests are blocked > 32 sec
2015-11-10 08:04:41.993675 osd.13 10.14.0.76:6814/5175 106 : cluster [WRN] 30 slow requests, 30 included below; oldest blocked for > 30.207798 secs
2015-11-10 08:04:42.993975 osd.13 10.14.0.76:6814/5175 112 : cluster [WRN] 32 slow requests, 27 included below; oldest blocked for > 31.208280 secs
2015-11-10 08:04:43.994367 osd.13 10.14.0.76:6814/5175 118 : cluster [WRN] 35 slow requests, 25 included below; oldest blocked for > 32.208673 secs
2015-11-10 08:04:44.994712 osd.13 10.14.0.76:6814/5175 124 : cluster [WRN] 25 slow requests, 16 included below; oldest blocked for > 33.205598 secs
2015-11-10 08:04:45.995052 osd.13 10.14.0.76:6814/5175 130 : cluster [WRN] 26 slow requests, 15 included below; oldest blocked for > 34.124413 secs
2015-11-10 08:04:46.995360 osd.13 10.14.0.76:6814/5175 136 : cluster [WRN] 24 slow requests, 11 included below; oldest blocked for > 35.124517 secs
2015-11-10 08:04:47.995689 osd.13 10.14.0.76:6814/5175 142 : cluster [WRN] 22 slow requests, 6 included below; oldest blocked for > 36.124712 secs
2015-11-10 08:04:48.996059 osd.13 10.14.0.76:6814/5175 148 : cluster [WRN] 9 slow requests, 1 included below; oldest blocked for > 37.122843 secs
2015-11-10 08:05:05.238556 osd.13 10.14.0.76:6814/5175 150 : cluster [WRN] 12 slow requests, 3 included below; oldest blocked for > 53.365283 secs
2015-11-10 08:05:09.683333 osd.13 10.14.0.76:6814/5175 154 : cluster [WRN] 16 slow requests, 4 included below; oldest blocked for > 57.809976 secs
2015-11-10 08:05:11.895482 osd.13 10.14.0.76:6814/5175 159 : cluster [WRN] 18 slow requests, 11 included below; oldest blocked for > 60.022206 secs
2015-11-10 08:05:13.730638 osd.13 10.14.0.76:6814/5175 165 : cluster [WRN] 21 slow requests, 8 included below; oldest blocked for > 61.857323 secs
2015-11-10 08:05:14.731015 osd.13 10.14.0.76:6814/5175 171 : cluster [WRN] 24 slow requests, 6 included below; oldest blocked for > 62.857742 secs
2015-11-10 08:05:15.731261 osd.13 10.14.0.76:6814/5175 177 : cluster [WRN] 35 slow requests, 12 included below; oldest blocked for > 63.857998 secs
2015-11-10 08:05:17.028076 osd.13 10.14.0.76:6814/5175 183 : cluster [WRN] 43 slow requests, 15 included below; oldest blocked for > 65.154773 secs
2015-11-10 08:05:18.127205 osd.13 10.14.0.76:6814/5175 189 : cluster [WRN] 45 slow requests, 12 included below; oldest blocked for > 66.253932 secs
2015-11-10 08:05:19.127468 osd.13 10.14.0.76:6814/5175 195 : cluster [WRN] 48 slow requests, 14 included below; oldest blocked for > 67.254104 secs
2015-11-10 08:05:20.127937 osd.13 10.14.0.76:6814/5175 201 : cluster [WRN] 52 slow requests, 14 included below; oldest blocked for > 68.254581 secs
2015-11-10 08:05:22.065629 osd.13 10.14.0.76:6814/5175 207 : cluster [WRN] 53 slow requests, 14 included below; oldest blocked for > 70.192250 secs
2015-11-10 08:05:23.065965 osd.13 10.14.0.76:6814/5175 213 : cluster [WRN] 57 slow requests, 13 included below; oldest blocked for > 71.192553 secs
2015-11-10 08:05:24.066355 osd.13 10.14.0.76:6814/5175 219 : cluster [WRN] 58 slow requests, 9 included below; oldest blocked for > 72.192932 secs
2015-11-10 08:05:25.066731 osd.13 10.14.0.76:6814/5175 225 : cluster [WRN] 61 slow requests, 7 included below; oldest blocked for > 73.193356 secs
2015-11-10 08:05:26.067590 osd.13 10.14.0.76:6814/5175 231 : cluster [WRN] 62 slow requests, 3 included below; oldest blocked for > 74.193947 secs
2015-11-10 08:05:27.067844 osd.13 10.14.0.76:6814/5175 235 : cluster [WRN] 63 slow requests, 1 included below; oldest blocked for > 75.194501 secs
2015-11-10 08:05:32.306675 osd.13 10.14.0.76:6814/5175 237 : cluster [WRN] 59 slow requests, 1 included below; oldest blocked for > 80.433195 secs
2015-11-10 09:13:46.210699 osd.2 10.14.0.75:6804/29163 46 : cluster [WRN] 34 slow requests, 34 included below; oldest blocked for > 30.810297 secs
2015-11-10 09:13:47.211462 osd.2 10.14.0.75:6804/29163 52 : cluster [WRN] 38 slow requests, 33 included below; oldest blocked for > 31.811420 secs
2015-11-10 09:13:48.211718 osd.2 10.14.0.75:6804/29163 58 : cluster [WRN] 40 slow requests, 30 included below; oldest blocked for > 32.811678 secs
2015-11-10 09:13:49.212002 osd.2 10.14.0.75:6804/29163 64 : cluster [WRN] 43 slow requests, 28 included below; oldest blocked for > 33.811957 secs
2015-11-10 09:13:50.213554 osd.2 10.14.0.75:6804/29163 70 : cluster [WRN] 45 slow requests, 25 included below; oldest blocked for > 34.812999 secs
2015-11-10 09:13:51.214046 osd.2 10.14.0.75:6804/29163 76 : cluster [WRN] 50 slow requests, 25 included below; oldest blocked for > 35.813991 secs
2015-11-10 09:13:52.215101 osd.2 10.14.0.75:6804/29163 82 : cluster [WRN] 49 slow requests, 21 included below; oldest blocked for > 36.813431 secs
2015-11-10 09:13:53.215519 osd.2 10.14.0.75:6804/29163 88 : cluster [WRN] 43 slow requests, 19 included below; oldest blocked for > 37.810298 secs
2015-11-10 09:13:54.215797 osd.2 10.14.0.75:6804/29163 94 : cluster [WRN] 19 slow requests, 7 included below; oldest blocked for > 37.922869 secs
2015-11-10 09:13:55.216838 osd.2 10.14.0.75:6804/29163 100 : cluster [WRN] 6 slow requests, 1 included below; oldest blocked for > 37.592385 secs
2015-11-10 09:13:56.217302 osd.2 10.14.0.75:6804/29163 102 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.036856 secs
2015-11-10 10:18:00.293677 osd.0 10.14.0.75:6800/28850 109 : cluster [WRN] 5 slow requests, 5 included below; oldest blocked for > 30.137196 secs
2015-11-10 10:18:02.295197 osd.0 10.14.0.75:6800/28850 115 : cluster [WRN] 3 slow requests, 3 included below; oldest blocked for > 30.225206 secs
2015-11-10 10:18:03.296209 osd.0 10.14.0.75:6800/28850 119 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.640530 secs
Aqui está o nosso momentâneo ceph.conf:
[global]
fsid = xxx
mon_initial_members = mon1 mon2 mon3
mon_host = 10.14.0.6
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd pool default size = 3
public network = 10.14.0.0/24
cluster network = 10.14.0.0/24
rbd default format = 2
[osd]
osd journal size = 10240
osd recovery max active = 1
osd max backfills = 1
filestore max sync interval = 30 # just for testing
filestore min sync interval = 29 # no impact detectable
Esta é a árvore do osd:
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 14.23999 root default
-6 3.56000 host host1
8 0.89000 osd.8 up 1.00000 1.00000
9 0.89000 osd.9 up 1.00000 1.00000
10 0.89000 osd.10 up 1.00000 1.00000
11 0.89000 osd.11 up 1.00000 1.00000
-2 3.56000 host host2
2 0.89000 osd.2 up 1.00000 1.00000
5 0.89000 osd.5 up 1.00000 1.00000
7 0.89000 osd.7 up 1.00000 1.00000
0 0.89000 osd.0 up 0.79143 1.00000
-4 3.56000 host host3
12 0.89000 osd.12 up 1.00000 1.00000
13 0.89000 osd.13 up 1.00000 1.00000
14 0.89000 osd.14 up 1.00000 1.00000
15 0.89000 osd.15 up 1.00000 1.00000
-3 3.56000 host host4
1 0.89000 osd.1 up 1.00000 1.00000
3 0.89000 osd.3 up 1.00000 1.00000
4 0.89000 osd.4 up 1.00000 1.00000
6 0.89000 osd.6 up 0.86749 1.00000
Este é o osd df:
ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR
8 0.89000 1.00000 916G 556G 359G 60.75 1.03
9 0.89000 1.00000 916G 564G 351G 61.61 1.05
10 0.89000 1.00000 916G 514G 402G 56.12 0.95
11 0.89000 1.00000 916G 510G 406G 55.68 0.95
2 0.89000 1.00000 916G 586G 329G 64.06 1.09
5 0.89000 1.00000 916G 456G 459G 49.85 0.85
7 0.89000 1.00000 915G 546G 368G 59.71 1.02
0 0.89000 0.79143 916G 615G 300G 67.16 1.14
12 0.89000 1.00000 916G 472G 443G 51.61 0.88
13 0.89000 1.00000 916G 628G 287G 68.60 1.17
14 0.89000 1.00000 916G 540G 375G 59.01 1.00
15 0.89000 1.00000 916G 596G 319G 65.15 1.11
1 0.89000 1.00000 916G 553G 362G 60.39 1.03
3 0.89000 1.00000 916G 462G 453G 50.53 0.86
4 0.89000 1.00000 916G 472G 443G 51.58 0.88
6 0.89000 0.86749 916G 540G 375G 58.99 1.00
TOTAL 14657G 8618G 6039G 58.80
MIN/MAX VAR: 0.85/1.17 STDDEV: 5.67
Aqui está um exemplo de um QVM KVM:
<domain type='kvm'>
<name>testvm</name>
<uuid>xxx</uuid>
<memory unit='KiB'>12582912</memory>
<currentMemory unit='KiB'>12582912</currentMemory>
<vcpu placement='static'>4</vcpu>
<os>
<type arch='x86_64' machine='pc-i440fx-trusty'>hvm</type>
<bootmenu enable='yes'/>
</os>
<features>
<acpi/>
<apic/>
<pae/>
</features>
<cpu mode='custom' match='exact'>
<model fallback='allow'>SandyBridge</model>
<vendor>Intel</vendor>
<feature policy='require' name='pbe'/>
<feature policy='require' name='tm2'/>
<feature policy='require' name='est'/>
<feature policy='require' name='vmx'/>
<feature policy='require' name='osxsave'/>
<feature policy='require' name='smx'/>
<feature policy='require' name='ss'/>
<feature policy='require' name='ds'/>
<feature policy='require' name='vme'/>
<feature policy='require' name='dtes64'/>
<feature policy='require' name='ht'/>
<feature policy='require' name='dca'/>
<feature policy='require' name='pcid'/>
<feature policy='require' name='tm'/>
<feature policy='require' name='pdcm'/>
<feature policy='require' name='pdpe1gb'/>
<feature policy='require' name='ds_cpl'/>
<feature policy='require' name='xtpr'/>
<feature policy='require' name='acpi'/>
<feature policy='require' name='monitor'/>
</cpu>
<clock offset='utc'>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='pit' tickpolicy='delay'/>
<timer name='hpet' present='no'/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<pm>
<suspend-to-mem enabled='no'/>
<suspend-to-disk enabled='no'/>
</pm>
<devices>
<emulator>/usr/bin/kvm-spice</emulator>
<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='writeback' discard='unmap'/>
<auth username='admin'>
<secret type='ceph' uuid='xxx'/>
</auth>
<source protocol='rbd' name='vms/testvm'>
<host name='mon1' port='6789'/>
<host name='mon2' port='6789'/>
<host name='mon3' port='6789'/>
</source>
<target dev='sda' bus='scsi'/>
<boot order='1'/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
<controller type='usb' index='0' model='ich9-ehci1'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x7'/>
</controller>
<controller type='usb' index='0' model='ich9-uhci1'>
<master startport='0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0' multifunction='on'/>
</controller>
<controller type='usb' index='0' model='ich9-uhci2'>
<master startport='2'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x1'/>
</controller>
<controller type='usb' index='0' model='ich9-uhci3'>
<master startport='4'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x2'/>
</controller>
<controller type='pci' index='0' model='pci-root'/>
<controller type='scsi' index='0' model='virtio-scsi'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</controller>
<interface type='bridge'>
<mac address='xxx'/>
<source bridge='br0'/>
<model type='virtio'/>
<boot order='2'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
<serial type='pty'>
<target port='0'/>
</serial>
<console type='pty'>
<target type='serial' port='0'/>
</console>
<input type='mouse' bus='ps2'/>
<input type='keyboard' bus='ps2'/>
<graphics type='vnc' port='-1' autoport='yes'/>
<video>
<model type='cirrus' vram='9216' heads='1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</video>
<memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</memballoon>
</devices>
</domain>