Hadoop：我怎样才能减少输出到一个文件？ [英] Hadoop: How can i merge reducer outputs to a single file?

查看：119 发布时间：2018/5/31 19:18:12 java hadoop merge mapreduce hdfs

本文介绍了Hadoop：我怎样才能减少输出到一个文件？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我知道shell中的getmerge命令可以完成这项工作。

但是如果我想在HDFS API作业之后合并这些输出，我该怎么办？对于java？

我真正想要的是HDFS上的一个合并文件。

我唯一能做的就是想到的是在此之后开始一项额外的工作。

谢谢！

解决方案

我应该怎么做，如果我想要合并这些输出后的HDFS API为Java？

猜测，因为我没有自己尝试过，但我认为你正在寻找的方法是 FileUtil.copyMerge ，这是FsShell在运行时调用的方法 - getmerge 命令。 FileUtil.copyMerge 将两个FileSystem对象作为参数--FsShell使用FileSystem.getLocal来检索目标文件系统，但我没有看到任何导致您无法使用Path的原因。目标上的getFileSystem获得OutputStream

也就是说，我认为它不会赢得您的赞许 - 合并仍然在本地JVM中发生;所以你不是真的在 -getmerge 后加上 -put 。

I know that "getmerge" command in shell can do this work.

But what should I do if I want to merge these outputs after the job by HDFS API for java？

What i actually want is a single merged file on HDFS.

The only thing i can think of is to start an additional job after that.

thanks!
解决方案

But what should I do if I want to merge these outputs after the job by HDFS API for java?

Guessing, because I haven't tried this myself, but I think the method you are looking for is FileUtil.copyMerge, which is the method that FsShell invokes when you run the -getmerge command. FileUtil.copyMerge takes two FileSystem objects as arguments - FsShell uses FileSystem.getLocal to retrieve the destination FileSystem, but I don't see any reason you couldn't instead use Path.getFileSystem on the destination to obtain an OutputStream

That said, I don't think it wins you very much -- the merge is still happening in the local JVM; so you aren't really saving very much over -getmerge followed by -put.

这篇关于Hadoop：我怎样才能减少输出到一个文件？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Hadoop：我怎样才能减少输出到一个文件？ [英] Hadoop: How can i merge reducer outputs to a single file?

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Hadoop：我怎样才能减少输出到一个文件？ [英] Hadoop: How can i merge reducer outputs to a single file?

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭