写入hadoop中的多个文件夹？ [英] Writing to multiple folders in hadoop?

查看：103 发布时间：2018/5/31 19:30:19 hadoop

本文介绍了写入hadoop中的多个文件夹？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

 我的dirver的代码如下：
 FileOutputFormat.setOutputPath（job，new Path（output））; 
 //MultipleOutputs.addNamedOutput(job，namedOutput，outputFormatClass，keyClass，valueClass）
 //MultipleOutputs.addNamedOutput (job，namedOutput，outputFormatClass，keyClass，valueClass）
 MultipleOutputs.addNamedOutput（job， foo，TextOutputFormat.class，NullWritable.class，Text.class）; 
 MultipleOutputs.addNamedOutput（job，bar，TextOutputFormat.class，Text.class，NullWritable.class）; 
 MultipleOutputs.addNamedOutput（job，foobar，TextOutputFormat.class，Text.class，NullWritable.class）; 
 
然后我的reducer有如下代码：
 mos.write（foo，NullWritable.get（），new Text（jsn.toString（）））; 
 mos.write（bar，key，NullWritable.get（））; 
 mos.write（foobar，key，NullWritable.get（））; 
 
但是在输出中，我看到：
 
 output / foo-r-0001 
 output / foo-r-0002 
 output / foobar- r-0001 
输出/ bar-r-0001 
 
 
但是我想要的是：
 
 output / foo / part-r-0001 
 output / foo / part-r-0002 
 output / bar / part-r-0001

输出/ foobar / part-r-0001

我该怎么做？
谢谢

解决方案

如果您的意思是MultipleOutputs ，最简单的方法是从您的reducer执行以下任一操作 -

使用具有基本输出路径的命名输出。请参阅此功能。 没有命名输出且仅使用基本输出路径查看此功能

在你的情况中，它是第一点，所以，请更改以下内容 -

  mos.write（foo，NullWritable.get（），new Text（jsn.toString（）））; 
 mos.write（bar，key，NullWritable.get（））; 
 mos.write（foobar，key，NullWritable.get（））;

to，

  mos.write（foo，NullWritable.get（），new Text（jsn.toString（）），foo / part）; 
 mos.write（bar，key，NullWritable.get（），bar / part）; 
 mos.write（foobar，key，NullWritable.get（），foobar / part）;

其中，foo / part，bar /部分和foobar / part对应于baseOutputPath。
因此，将会创建目录foo，bar和foobar，并在 part-r-xxxxx 文件中创建。

尝试点2以上，实际上不需要任何命名的输出。

如果需要，请回到我的进一步说明。

I am trying to separate my output from reducer to different folders..
My dirver has the following code: FileOutputFormat.setOutputPath(job, new Path(output)); //MultipleOutputs.addNamedOutput(job, namedOutput, outputFormatClass, keyClass, valueClass) //MultipleOutputs.addNamedOutput(job, namedOutput, outputFormatClass, keyClass, valueClass) MultipleOutputs.addNamedOutput(job, "foo", TextOutputFormat.class, NullWritable.class, Text.class); MultipleOutputs.addNamedOutput(job, "bar", TextOutputFormat.class, Text.class,NullWritable.class); MultipleOutputs.addNamedOutput(job, "foobar", TextOutputFormat.class, Text.class, NullWritable.class); And then my reducer has the following code: mos.write("foo",NullWritable.get(),new Text(jsn.toString())); mos.write("bar", key,NullWritable.get()); mos.write("foobar", key,NullWritable.get()); But in the output, I see: output/foo-r-0001 output/foo-r-0002 output/foobar-r-0001 output/bar-r-0001 But what I am trying is : output/foo/part-r-0001 output/foo/part-r-0002 output/bar/part-r-0001
output/foobar/part-r-0001

How do I do this? Thanks
解决方案
If you mean this MultipleOutputs, the simplest way would be to do one of the following from you reducer --

Using named output with a base output path. See this function.

Without named output and using only a base output path, See this function

In your case, it's point 1, so, please change the following --
mos.write("foo",NullWritable.get(),new Text(jsn.toString())); mos.write("bar", key,NullWritable.get()); mos.write("foobar", key,NullWritable.get());
to,
mos.write("foo",NullWritable.get(),new Text(jsn.toString()), "foo/part"); mos.write("bar", key,NullWritable.get(), "bar/part"); mos.write("foobar", key,NullWritable.get(), "foobar/part");
Where, "foo/part", "bar/part" and "foobar/part" corresponds to the baseOutputPath. Hence, directories foo, bar and foobar would be created and inside that part-r-xxxxx files.

You might also try point 2 above, which actually don't need any named output.

Please get back to me for further clarification, if needed.

这篇关于写入hadoop中的多个文件夹？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

写入hadoop中的多个文件夹？ [英] Writing to multiple folders in hadoop?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

写入hadoop中的多个文件夹？ [英] Writing to multiple folders in hadoop?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭