fio config para medir IOPS em relação ao provedor SLA

2

Assim, um provedor nos deu 500 IOPS / TB como seus padrões de SLA para desempenho de disco em um VMWare & Ambiente RAID5-SAN. Isto é aparentemente medido com:

  • 16kB tamanho médio do bloco de transferência
  • 3: 1 leitura: taxa de gravação
  • Operações de E / S multithreaded
  • modelagem aleatória de E / S de 80%
  • Acertar o cache de 20%

O que eu quero fazer é determinar se alguma VM Linux específica está obtendo esse desempenho e, em seguida, executar o mesmo benchmark com outros provedores para que eu possa comparar.

Ao olhar em volta, parece que o fio é o mais configurável para medir o acima. A configuração que eu tenho até agora é:

[global]
blocksize=16k
rwmixread=75     # 3:1 read:write ratio
ramp_time=30
runtime=600
time_based
buffered=1
# size = free-ram * 80% / 5
# so we get a ~20% cache hit across the 5x processes
# this is for an 8GB ram host with 7.3GB free after buffers/cache
size=1180m

# create a mix to get to 80% random reads
# also means we'll be doing at least 5x IO operations in parallel
[sla-0]
readwrite=randrw:2

[sla-1]
readwrite=randrw:2

[sla-2]
readwrite=randrw

[sla-3]
readwrite=randrw

[sla-4]
readwrite=randrw

Sugestões de melhorias? Está usando buffered e o padrão ioengine é o melhor caminho a percorrer?

Se eu executar isso em uma máquina de núcleo virtual 4x ociosa, com 8 GB de RAM e 470 GB de armazenamento alocado, esperaria obter 235 IOPS pelo acima (500 * 0,47). Os resultados que obtenho são:

sla-0: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=sync, iodepth=2
sla-1: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=sync, iodepth=2
sla-2: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=sync, iodepth=2
sla-3: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=sync, iodepth=2
sla-4: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=sync, iodepth=2
Starting 5 processes
sla-0: Laying out IO file(s) (1 file(s) / 1180MB)
sla-1: Laying out IO file(s) (1 file(s) / 1180MB)
sla-2: Laying out IO file(s) (1 file(s) / 1180MB)
sla-3: Laying out IO file(s) (1 file(s) / 1180MB)
sla-4: Laying out IO file(s) (1 file(s) / 1180MB)
Jobs: 5 (f=5): [mmmmm] [100.0% done] [5931K/1966K /s] [362/120 iops] [eta 00m:00s] 
sla-0: (groupid=0, jobs=1): err= 0: pid=16701
  read : io=1086MB, bw=1853KB/s, iops=115, runt=600003msec
    clat (usec): min=4, max=1771K, avg=8607.53, stdev=22114.44
    bw (KB/s) : min=    0, max= 4087, per=24.44%, avg=1914.96, stdev=1130.29
  write: io=372416KB, bw=635586B/s, iops=38, runt=600003msec
    clat (usec): min=6, max=2574, avg=57.38, stdev=79.65
    bw (KB/s) : min=    0, max=11119, per=26.07%, avg=679.63, stdev=517.84
  cpu          : usr=0.08%, sys=0.63%, ctx=64513, majf=0, minf=109
  IO depths    : 1=107.4%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w: total=69474/23276, short=0/0
     lat (usec): 10=10.23%, 20=8.89%, 50=4.15%, 100=11.66%, 250=0.83%
     lat (usec): 500=1.48%, 750=1.41%, 1000=0.82%
     lat (msec): 2=0.83%, 4=1.56%, 10=47.07%, 20=5.91%, 50=4.24%
     lat (msec): 100=0.55%, 250=0.29%, 500=0.06%, 750=0.01%, 1000=0.01%
     lat (msec): 2000=0.01%
sla-1: (groupid=0, jobs=1): err= 0: pid=16702
  read : io=963360KB, bw=1605KB/s, iops=100, runt=600180msec
    clat (usec): min=4, max=2396K, avg=9934.23, stdev=30986.37
    bw (KB/s) : min=    0, max= 4657, per=21.64%, avg=1695.89, stdev=1273.00
  write: io=326000KB, bw=556206B/s, iops=33, runt=600180msec
    clat (usec): min=6, max=3882, avg=55.07, stdev=77.92
    bw (KB/s) : min=    0, max=10708, per=23.74%, avg=618.92, stdev=559.01
  cpu          : usr=0.08%, sys=0.53%, ctx=55500, majf=0, minf=129
  IO depths    : 1=108.5%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w: total=60210/20375, short=0/0
     lat (usec): 10=11.36%, 20=9.63%, 50=3.56%, 100=11.97%, 250=0.81%
     lat (usec): 500=0.66%, 750=0.50%, 1000=0.37%
     lat (msec): 2=0.33%, 4=0.74%, 10=49.56%, 20=3.78%, 50=5.48%
     lat (msec): 100=0.60%, 250=0.43%, 500=0.16%, 750=0.04%, 1000=0.01%
     lat (msec): 2000=0.01%, >=2000=0.01%
sla-2: (groupid=0, jobs=1): err= 0: pid=16703
  read : io=827584KB, bw=1379KB/s, iops=86, runt=600012msec
    clat (usec): min=397, max=2396K, avg=11569.59, stdev=31237.03
    bw (KB/s) : min=    0, max= 4237, per=18.60%, avg=1457.59, stdev=1113.89
  write: io=276192KB, bw=471358B/s, iops=28, runt=600012msec
    clat (usec): min=8, max=8339, avg=63.95, stdev=121.52
    bw (KB/s) : min=    0, max= 8531, per=20.52%, avg=534.85, stdev=478.91
  cpu          : usr=0.07%, sys=0.54%, ctx=57019, majf=0, minf=89
  IO depths    : 1=109.9%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w: total=51724/17262, short=0/0
     lat (usec): 10=0.98%, 20=5.38%, 50=3.53%, 100=13.68%, 250=0.92%
     lat (usec): 500=0.60%, 750=0.39%, 1000=0.22%
     lat (msec): 2=0.24%, 4=2.26%, 10=59.15%, 20=4.90%, 50=6.28%
     lat (msec): 100=0.78%, 250=0.48%, 500=0.18%, 750=0.03%, 1000=0.01%
     lat (msec): 2000=0.01%, >=2000=0.01%
sla-3: (groupid=0, jobs=1): err= 0: pid=16704
  read : io=865920KB, bw=1443KB/s, iops=90, runt=600005msec
    clat (usec): min=369, max=2396K, avg=11052.97, stdev=32396.85
    bw (KB/s) : min=    0, max= 5984, per=19.47%, avg=1525.97, stdev=1164.42
  write: io=285568KB, bw=487365B/s, iops=29, runt=600005msec
    clat (usec): min=7, max=11910, avg=65.72, stdev=154.09
    bw (KB/s) : min=    0, max=11064, per=21.38%, avg=557.30, stdev=534.59
  cpu          : usr=0.07%, sys=0.57%, ctx=59458, majf=0, minf=109
  IO depths    : 1=109.5%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w: total=54120/17848, short=0/0
     lat (usec): 10=0.99%, 20=5.11%, 50=3.58%, 100=13.64%, 250=0.89%
     lat (usec): 500=0.71%, 750=0.48%, 1000=0.30%
     lat (msec): 2=0.70%, 4=4.00%, 10=57.63%, 20=5.21%, 50=5.40%
     lat (msec): 100=0.70%, 250=0.43%, 500=0.16%, 750=0.03%, 1000=0.01%
     lat (msec): 2000=0.01%, >=2000=0.01%
sla-4: (groupid=0, jobs=1): err= 0: pid=16705
  read : io=934752KB, bw=1558KB/s, iops=97, runt=600007msec
    clat (usec): min=187, max=2396K, avg=10236.87, stdev=26080.98
    bw (KB/s) : min=    0, max=11419, per=20.74%, avg=1625.28, stdev=1338.26
  write: io=304528KB, bw=519721B/s, iops=31, runt=600007msec
    clat (usec): min=7, max=7572, avg=67.29, stdev=117.27
    bw (KB/s) : min=    0, max=10772, per=22.06%, avg=575.17, stdev=560.68
  cpu          : usr=0.08%, sys=0.60%, ctx=63685, majf=0, minf=129
  IO depths    : 1=108.7%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w: total=58422/19033, short=0/0
     lat (usec): 10=0.81%, 20=4.77%, 50=3.62%, 100=13.77%, 250=0.97%
     lat (usec): 500=1.45%, 750=0.64%, 1000=0.53%
     lat (msec): 2=1.75%, 4=4.71%, 10=53.48%, 20=6.92%, 50=5.53%
     lat (msec): 100=0.56%, 250=0.37%, 500=0.08%, 750=0.02%, 1000=0.01%
     lat (msec): 2000=0.01%, >=2000=0.01%

Run status group 0 (all jobs):
   READ: io=4593MB, aggrb=7836KB/s, minb=1412KB/s, maxb=1897KB/s, mint=600003msec, maxt=600180msec
  WRITE: io=1528MB, aggrb=2607KB/s, minb=471KB/s, maxb=635KB/s, mint=600003msec, maxt=600180msec

Disk stats (read/write):
  dm-0: ios=298995/596154, merge=0/0, ticks=3107720/433061790, in_queue=436170340, util=99.68%, aggrios=0/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
    sdb: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=-nan%

Resumindo, leia e escreva IOPS para cada trabalho (por que isso não inclui isso em seu resumo?) Recebo 647, o que parece estar excedendo os níveis de serviço especificados. Qualquer coisa óbvia que esteja faltando, ou suas métricas são distorcidas massivamente para algumas cargas de trabalho (especificamente estou interessado no PostgreSQL com cargas de trabalho de data warehouse).

    
por rcoup 05.10.2011 / 01:10

1 resposta

1

SQL e data warehouses são mais semelhantes a 8: 1 para gravações, todas em blocos pequenos, todas aleatórias. Em qualquer caso, nada além de leituras aleatórias é fácil de armazenar em cache e provavelmente não está causando problemas de desempenho do disco. Sem saber como eles fazem seus discos, é difícil realmente ajudar muito, mas pense em perguntar o que eles querem dizer quando especificar "ambiente RAID5-SAN".

Como eles especificam um SLA como IOPS por TB, eu diria que cada volume que eles fornecem a você deve estar em um RAID-5 separado, permitindo mais IOPS à medida que eles adicionam volumes. O mau desempenho pode ser facilmente causado por vizinhos com raids ruins: volumes no mesmo ataque que você recebe mais do que seu quinhão de recursos de armazenamento. O problema é que às vezes o seu SLA será excedido, mas às vezes você terá que lidar com alta latência.

Comece por avisá-los de que você está insatisfeito com o desempenho, e eles podem simplesmente levá-lo a uma invasão menos utilizada que possa resolver todos os seus problemas. Também pergunte a eles se eles têm algum armazenamento RAID-10 disponível, e talvez peça um volume lá em vez do RAID 5. Se o problema voltar, considere obter seu próprio armazenamento ou encontrar algum outro host que possa fornecer melhor desempenho.

    
por 05.10.2011 / 15:53