Hadoop MapReduce迭代reduce调用的输入值 [英] Hadoop MapReduce iterate over input values of a reduce call

查看:139
本文介绍了Hadoop MapReduce迭代reduce调用的输入值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在测试一个简单的mapreduce应用程序,但是我试图了解在遍历reduce调用的输入值时会发生什么。



这是一段奇怪的代码。

  public void reduce(Text key,Iterable< E> values,Context上下文)
抛出IOException,InterruptedException {

Iterator< E> iterator = values.iterator();
E first =(E)statesIter.next();

while(statesIter.hasNext()){
E state = statesIter.next();

System.out.println(first.toString());
//一些其他的东西
}
//一些其他的东西
}

所以没什么奇怪的,除了每个println调用实际上打印一个不同的字符串。因此,每次我调用 next()方法时,由 first 引用的对象都会发生变化。



那么为什么会出现这种奇怪的行为呢? 这有点违反直觉,但它实际上记录在API文档中 - Hadoop重用键/值,您应该克隆它们想要保持它们。


I'm testing a simple mapreduce application, but I'm getting a little stuck trying to understand what happen when I iterate over input values of a reduce call.

This is the piece of code which behaves strangely..

public void reduce(Text key, Iterable<E> values, Context context)
    throws IOException, InterruptedException{

    Iterator<E> iterator = values.iterator();
    E first = (E)statesIter.next();

    while(statesIter.hasNext()){
        E state = statesIter.next();

        System.out.println(first.toString());
        // some other stuff
    }
    // some other stuff
}

so nothing strange.. except the fact that each println invocation actually prints a different string. So, every time I call the next() method, the object referenced by first changes.

So why this strange behavior?

解决方案

It's somewhat counter-intuitive, but it's actually documented in the API docs -- Hadoop reuses the keys / values, you should clone them if you want to keep them around.

这篇关于Hadoop MapReduce迭代reduce调用的输入值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆