Hadoop:在 HDFS 中压缩文件? [英] Hadoop: compress file in HDFS?

查看:33
本文介绍了Hadoop:在 HDFS 中压缩文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近在 Hadoop 中设置了 LZO 压缩.在 HDFS 中压缩文件的最简单方法是什么?我想压缩一个文件,然后删除原来的.我应该使用 IdentityMapper 和使用 LZO 压缩的 IdentityReducer 创建 MR 作业吗?

I recently set up LZO compression in Hadoop. What is the easiest way to compress a file in HDFS? I want to compress a file and then delete the original. Should I create a MR job with an IdentityMapper and an IdentityReducer that uses LZO compression?

推荐答案

我建议您编写一个 MapReduce 作业,正如您所说,它只使用身份映射器.在此过程中,您应该考虑将数据写入序列文件以提高性能加载.您还可以在块级和记录级压缩中存储序列文件.您应该看看什么最适合您,因为两者都针对不同类型的记录进行了优化.

I suggest you write a MapReduce job that, as you say, just uses the Identity mapper. While you are at it, you should consider writing the data out to sequence files to improve performance loading. You can also store sequence files in block-level and record-level compression. Yo should see what works best for you, as both are optimized for different types of records.

这篇关于Hadoop:在 HDFS 中压缩文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆