Hadoop HDFS MapReduce输出到MongoDb [英] Hadoop HDFS MapReduce output into MongoDb

查看:141
本文介绍了Hadoop HDFS MapReduce输出到MongoDb的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想编写Java程序,它从HDFS读取输入,使用MapReduce处理它并将输出写入到MongoDb中。



以下是这种情况:


  1. 我有一个拥有3个datanode的Hadoop集群。
  2. Java程序从HDFS读取输入,使用MapReduce处理它。

  3. 最后,将结果写入MongoDb中。

从HDFS读取和使用MapReduce处理它很简单。但我陷入了将结果写入MongoDb的困境。是否有任何Java API支持将结果写入MongoDB?另一个问题是,由于它是一个Hadoop集群,所以我们不知道哪个datanode将运行Reducer任务并生成结果,是否可以将结果写入安装在特定服务器上的MongoDb?



如果我想将结果写入HDFS,代码将如下所示:

  public void reduce(Text key,Iterable< LongWritable> values,Context context)throws IOException,InterruptedException 
{
long sum = 0; (LongWritable value:values)

{
sum + = value.get();


context.write(new Text(key),new LongWritable(sum));
}

现在我想将结果写入MongoDb而不是HDFS,怎么能我这样做?

解决方案

你想用于Hadoop的MongoDB连接器示例



只需在您的Reducer中添加代码即可,作为副作用,可将数据插入数据库。避免这种诱惑。与使用reducer类的副作用插入数据相比,使用连接器的一个原因是推测性执行:Hadoop有时可以并行运行两个完全相同的reduce任务,这可能会导致无关的插入和重复数据。 p>

I want to write Java program which reads input from HDFS, processes it using MapReduce and writes the output into a MongoDb.

Here is the scenario:

  1. I have a Hadoop Cluster which has 3 datanodes.
  2. A java program reads the input from the HDFS, processes it using MapReduce.
  3. Finally, write the result into a MongoDb.

Actually, reading from HDFS and processing it with MapReduce are simple. But I gets stuck about writing the result into a MongoDb. Is there any Java API supported to write the result into MongoDB? Another question is that since it is a Hadoop Cluster, so we don't know which datanode will run the Reducer task and generate the result, is it possible to write the result into a MongoDb which is installed on a specific server?

If I want to write the result into HDFS, the code will be like this:

@Override
public void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException 
{
    long sum = 0;
    for (LongWritable value : values) 
    {
        sum += value.get();
    }

    context.write(new Text(key), new LongWritable(sum));
}

Now I want to write the result into a MongoDb instead of HDFS, how can I do that?

解决方案

You want «MongoDB Connector for Hadoop». The examples.

It's tempting to just add code in your Reducer that, as a side effect, inserts data into your database. Avoid this temptation. One reason to use a connector as opposed to just inserting data as a side effect of your reducer class is speculative execution: Hadoop can sometimes run two of the exact same reduce tasks in parallel, which can lead to extraneous inserts and duplicate data.

这篇关于Hadoop HDFS MapReduce输出到MongoDb的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆