Hadoop的:如何发送MultipleOutputs 2个不同的路径/文件系统? [英] Hadoop: How to send MultipleOutputs to 2 different paths / file systems?
问题描述
我得配置以产生2名为输出MultipleOutputs。我想送一个 S3N://
和一个 HDFS://
I've got MultipleOutputs configured to generate 2 named outputs. I'd like to send one to s3n://
and one to hdfs://
这可能吗?
推荐答案
这是目前无法做到与现有的API。
This is not currently possible to do with the available API.
如果Hadoop的马preduce目前的子目录作品(配置的输出目录)中,只有 MultipleOutputs
班产量名字,让它走的护理<一href="http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F"相对=nofollow>副作用引起的推测执行。
The MultipleOutputs
class if Hadoop MapReduce currently works only for sub-directory (of the configured output directory) output names, which lets it take care of the side-effects caused by speculative execution.
不过,你可以重新实现类(或从中获得),以支持这是它不是不可能实现的。你可能需要一个更复杂的OutputCommitter实施以及,如果您打算让您的实施提供支持推测执行。
However, you could re-implement the class (or derive from it), to support this as its not impossible to achieve. You may need a more complex OutputCommitter implementation as well, if you plan on making your implementation support speculative execution.
这篇关于Hadoop的:如何发送MultipleOutputs 2个不同的路径/文件系统?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!