使用Hadoop 0.20+生成多个输出文件 [英] Generating Multiple Output files with Hadoop 0.20+

查看:111
本文介绍了使用Hadoop 0.20+生成多个输出文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将我的reducer的结果输出到多个文件。数据结果全部包含在一个文件中,其余结果根据其尊重文件中的类别进行分割。我知道0.18,你可以用MultipleOutputs做到这一点,它并没有被删除。但是,我正在试图使我的应用程序0.20+兼容。现有的Multiple输出功能仍然需要JobConf(我的应用程序使用Job和Configuration)。如何根据键生成多个输出?

解决方案

对MultipleOutputs的支持不在0.20。您将需要使用较旧的API。

它已被添加到0.21,目前还没有发布为org.apache.hadoop.mapreduce.lib.output.MultipleOutputs。



< p&a; >邮件列表中的这个线程讨论了这个问题。


I am trying to output the results of my reducer to multiple files. The data results are all contained in one file, and the rest of the results are split based on a category in their respected files. I know with 0.18 that you can do this with MultipleOutputs and it has not been removed. However, I am trying to make my application 0.20+ compliant. The existing Multiple outputs functionality still requires JobConf (which my application uses Job, and Configuration). How can I generate multiple outputs based on the key?

解决方案

Support for MultipleOutputs isn't in 0.20. You will need to use the older API.

It has been added into 0.21 which is currently unreleased as org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.

This thread on the mailing list talks about this problem.

这篇关于使用Hadoop 0.20+生成多个输出文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆