Hadoop HDFS MapReduce输出到MongoDb [英] Hadoop HDFS MapReduce output into MongoDb

查看：141 发布时间：2018/5/31 19:44:38 java mongodb hadoop mapreduce hdfs

本文介绍了Hadoop HDFS MapReduce输出到MongoDb的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想编写Java程序，它从HDFS读取输入，使用MapReduce处理它并将输出写入到MongoDb中。

以下是这种情况：

我有一个拥有3个datanode的Hadoop集群。
Java程序从HDFS读取输入，使用MapReduce处理它。

最后，将结果写入MongoDb中。

从HDFS读取和使用MapReduce处理它很简单。但我陷入了将结果写入MongoDb的困境。是否有任何Java API支持将结果写入MongoDB？另一个问题是，由于它是一个Hadoop集群，所以我们不知道哪个datanode将运行Reducer任务并生成结果，是否可以将结果写入安装在特定服务器上的MongoDb？

如果我想将结果写入HDFS，代码将如下所示：

  public void reduce（Text key，Iterable< LongWritable> values，Context context）throws IOException，InterruptedException 
 {
 long sum = 0; （LongWritable value：values）
 
 {
 sum + = value.get（）; 
 
 
 context.write（new Text（key），new LongWritable（sum））; 
}

现在我想将结果写入MongoDb而不是HDFS，怎么能我这样做？

解决方案

你想用于Hadoop的MongoDB连接器。示例。

只需在您的Reducer中添加代码即可，作为副作用，可将数据插入数据库。避免这种诱惑。与使用reducer类的副作用插入数据相比，使用连接器的一个原因是推测性执行：Hadoop有时可以并行运行两个完全相同的reduce任务，这可能会导致无关的插入和重复数据。 p>
I want to write Java program which reads input from HDFS, processes it using MapReduce and writes the output into a MongoDb.

Here is the scenario:

I have a Hadoop Cluster which has 3 datanodes.

A java program reads the input from the HDFS, processes it using MapReduce.

Finally, write the result into a MongoDb.

Actually, reading from HDFS and processing it with MapReduce are simple. But I gets stuck about writing the result into a MongoDb. Is there any Java API supported to write the result into MongoDB? Another question is that since it is a Hadoop Cluster, so we don't know which datanode will run the Reducer task and generate the result, is it possible to write the result into a MongoDb which is installed on a specific server?

If I want to write the result into HDFS, the code will be like this:
@Override public void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { long sum = 0; for (LongWritable value : values) { sum += value.get(); } context.write(new Text(key), new LongWritable(sum)); }
Now I want to write the result into a MongoDb instead of HDFS, how can I do that?
解决方案
You want «MongoDB Connector for Hadoop». The examples.

It's tempting to just add code in your Reducer that, as a side effect, inserts data into your database. Avoid this temptation. One reason to use a connector as opposed to just inserting data as a side effect of your reducer class is speculative execution: Hadoop can sometimes run two of the exact same reduce tasks in parallel, which can lead to extraneous inserts and duplicate data.

这篇关于Hadoop HDFS MapReduce输出到MongoDb的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Hadoop HDFS MapReduce输出到MongoDb [英] Hadoop HDFS MapReduce output into MongoDb

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Hadoop HDFS MapReduce输出到MongoDb [英] Hadoop HDFS MapReduce output into MongoDb

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭