Hadoop 输入的最佳可拆分压缩 = bz2? [英] Best splittable compression for Hadoop input = bz2?

查看:31
本文介绍了Hadoop 输入的最佳可拆分压缩 = bz2?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们意识到为 Hadoop 处理以 GZip 格式存档我们的文件并不是一个好主意,这一点为时已晚.GZip 不可拆分,这里有问题我不再赘述,仅供参考:

We've realized a bit too late that archiving our files in GZip format for Hadoop processing isn't such a great idea. GZip isn't splittable, and for reference, here are the problems which I won't repeat:

我的问题是:BZip2 是允许 Hadoop 并行处理单个存档文件的最佳存档压缩吗?Gzip 绝对不是,而且从我的阅读来看,LZO 有一些问题.

My question is: is BZip2 the best archival compression that will allow a single archive file to be processed in parallel by Hadoop? Gzip is definitely not, and from my reading LZO has some problems.

推荐答案

BZIP2 在 hadoop 中是可拆分的 - 它提供了非常好的压缩率,但从 CPU 时间和性能来看,并没有提供最佳结果,因为压缩非常消耗 CPU.

BZIP2 is splittable in hadoop - it provides very good compression ratio but from CPU time and performances is not providing optimal results, as compression is very CPU consuming.

LZO 在 hadoop 中是可拆分的 - 利用 hadoop-lzo 您有可拆分的压缩 LZO 文件.您需要有外部 .lzo.index 文件才能并行处理.该库提供了以本地或分布式方式生成这些索引的所有方法.

LZO is splittable in hadoop - leveraging hadoop-lzo you have splittable compressed LZO files. You need to have external .lzo.index files to be able to process in parallel. The library provides all means of generating these indexes in local or distributed manner.

LZ4 在 hadoop 中可拆分 - 利用 hadoop-4mc 您有可拆分的 4mc 压缩文件.您不需要任何外部索引,您可以使用提供的命令行工具或 Java/C 代码在 hadoop 内部/外部生成档案.4mc 可以在任何级别的速度/压缩比下在 hadoop LZ4 上使用:从达到 500 MB/s 压缩速度的快速模式到提供更高压缩比的高/超模式,几乎可以与 GZIP 相媲美.

LZ4 is splittable in hadoop - leveraging hadoop-4mc you have splittable compressed 4mc files. You don't need any external indexing, and you can generate archives with provided command line tool or by Java/C code, inside/outside hadoop. 4mc makes available on hadoop LZ4 at any level of speed/compression-ratio: from fast mode reaching 500 MB/s compression speed up to high/ultra modes providing increased compression ratio, almost comparable with GZIP one.

这篇关于Hadoop 输入的最佳可拆分压缩 = bz2?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆