Hadoop:在HDFS压缩文件? [英] Hadoop: compress file in HDFS?

查看:103
本文介绍了Hadoop:在HDFS压缩文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近在Hadoop中设置了LZO压缩。在HDFS中压缩文件最简单的方法是什么?我想压缩一个文件,然后删除原来的。我应该创建一个MR作业与IdentityMapper和IdentityReducer使用LZO压缩?

I recently set up LZO compression in Hadoop. What is the easiest way to compress a file in HDFS? I want to compress a file and then delete the original. Should I create a MR job with an IdentityMapper and an IdentityReducer that uses LZO compression?

推荐答案

我建议你写一个MapReduce作业,正如你所说,只是使用身份映射器。当你处于这种状态时,你应该考虑将数据写入序列文件以提高性能加载。您还可以在块级和记录级压缩中存储序列文件。你应该看到什么最适合你,因为两者都针对不同类型的记录进行了优化。

I suggest you write a MapReduce job that, as you say, just uses the Identity mapper. While you are at it, you should consider writing the data out to sequence files to improve performance loading. You can also store sequence files in block-level and record-level compression. Yo should see what works best for you, as both are optimized for different types of records.

这篇关于Hadoop:在HDFS压缩文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆