将输出写入不同的文件夹 hadoop [英] Writing output to different folders hadoop
问题描述
- 我想将来自同一个 reducer 的两种不同类型的输出写入两个不同的目录.
我可以使用 hadoop 中的多输出功能来写入不同的文件,但它们都进入同一个输出文件夹.
I am able to use multipleoutputs feature in hadoop to write to different files, but they both go to the same output folder.
我想将同一个reduce中的每个文件写入不同的文件夹.
I want to write each file from the same reduce to a different folder.
有没有办法做到这一点?
Is there a way for doing this?
如果我尝试将例如hello/testfile"作为第二个参数,它会显示无效参数.所以我无法写入不同的文件夹.
If I try putting for example "hello/testfile", as the second argument, it shows invaid argument. So I m not able to write to different folders.
- 如果上述情况不可行,映射器是否可以从输入文件夹中仅读取特定文件?
请帮帮我.
提前致谢!
感谢您的回复.我能够使用上述方法成功读取文件.但在分布式模式下,我无法这样做.在减速机中,我有设置:
Thanks for the reply. I am able to read a file successfully using then above method. But in distributed mode, I am not able to do so. In the reducer, I have set:
mos.getCollector("data", report).collect(new Text(str_key), new Text(str_val));
(使用多个输出,在 Job Conf 中:我尝试使用
(Using multiple outputs, and in Job Conf: I tried using
FileInputFormat.setInputPaths(conf2, "/home/users/mlakshm/opchk285/data-r-00000*");
还有
FileInputFormat.setInputPaths(conf2, "/home/users/mlakshm/opchk285/data*");
但是,它给出了以下错误:
But, it gives the following error:
cause:org.apache.hadoop.mapred.InvalidInputException: Input Pattern hdfs://mentat.cluster:54310/home/users/mlakshm/opchk295/data-r-00000* matches 0 files
推荐答案
将 MultipleOutputs 代码复制到您的代码库中,并放宽对允许字符的限制.无论如何,我看不出任何正当的限制理由.
Copy the MultipleOutputs code into your code base and loosen the restriction on allowable characters. I can't see any valid reason for the restrictions anyway.
这篇关于将输出写入不同的文件夹 hadoop的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!