Hadoop gzip 压缩文件 [英] Hadoop gzip compressed files

查看：43 发布时间：2021/12/15 18:53:19 java algorithm data-structures hadoop mapreduce

本文介绍了Hadoop gzip 压缩文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是 hadoop 的新手，正在尝试处理维基百科转储.这是一个 6.7 GB 的 gzip 压缩 xml 文件.我读到 hadoop 支持 gzip 压缩文件，但只能由 mapper 在单个作业中处理，因为只有一个 mapper 可以解压缩它.这似乎限制了处理.有替代方案吗?比如将xml文件解压并拆分成多个块，然后用gzip重新压缩.

I am new to hadoop and trying to process wikipedia dump. It's a 6.7 GB gzip compressed xml file. I read that hadoop supports gzip compressed files but can only be processed by mapper on a single job as only one mapper can decompress it. This seems to put a limitation on the processing. Is there an alternative? like decompressing and splitting the xml file into multiple chunks and recompressing them with gzip.

我从 http://researchcomputing 阅读了有关 hadoop gzip 的信息.blogspot.com/2008/04/hadoop-and-compressed-files.html

感谢您的帮助.

Hadoop gzip 压缩文件 [英] Hadoop gzip compressed files

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Hadoop gzip 压缩文件 [英] Hadoop gzip compressed files

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭