Do white paper de Adurant:
The benefits of this configuration include:
- Reduced Hadoop cluster overhead by reducing the replication factor to 2x
- Reduced storage (disk space) requirements by reducing the replication factor to 2x
- Increased the number of copies of data to 4x via the ZFS Storage Appliance
- Added data compression via the ZFS Storage Appliance o Further reducing storage space requirements even in a mirrored pool configuration
- Added read and write caching via the ZFS Storage Appliance decreasing I/O response times
- Added data protection (RAID 1) with no added overhead to the Hadoop cluster
- Added fault tolerance via the ZFS Storage Appliance’s clustered heads
E os resultados:
The findings of the Hadoop ZFS Proof of Concept testing clearly indicate that the ZFS Storage Appliance is more than able to handle current Hadoop workloads. Data processing was CPU bound, memory utilization was nominal, I/O utilization was nominal, and data was compressed by a minimum of 3.5x.
Naturalmente, coisas como a eficiência de compactação dependem amplamente de seus dados, e o desempenho não depende apenas do design, mas também do hardware real. O documento também fornece um resumo da configuração. Você pode replicá-lo de maneira menor com menos nós e uma parte de seus dados reais e executar seus próprios testes de desempenho.