对值迭代两次 (MapReduce) [英] Iterate twice on values (MapReduce)

查看:27
本文介绍了对值迭代两次 (MapReduce)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我收到一个迭代器作为参数,我想对值进行两次迭代.

I receive an iterator as argument and I would like to iterate on values twice.

public void reduce(Pair<String,String> key, Iterator<IntWritable> values,
                   Context context)

有可能吗?如何 ?签名是由我使用的框架(即 Hadoop)强加的.

Is it possible ? How ? The signature is imposed by the framework I am using (namely Hadoop).

-- 编辑--
最后,reduce 方法的真正签名是带有 iterable.我被这个 wiki 页面误导了(这实际上是唯一一个未被弃用(但错误)的例子我发现的字数).

-- edit --
Finally the real signature of the reduce method is with an iterable. I was misled by this wiki page (which is actually the only non-deprecated (but wrong) example of wordcount I found).

推荐答案

如果您想再次迭代,我们必须缓存迭代器中的值.至少我们可以结合第一次迭代和缓存:

We have to cache the values from the iterator if you want to iterate again. At least we can combine the first iteration and the caching:

Iterator<IntWritable> it = getIterator();
List<IntWritable> cache = new ArrayList<IntWritable>();

// first loop and caching
while (it.hasNext()) {
   IntWritable value = it.next();
   doSomethingWithValue();
   cache.add(value);
}

// second loop
for(IntWritable value:cache) {
   doSomethingElseThatCantBeDoneInFirstLoop(value);
}

(只是用代码添加答案,知道您在自己的评论中提到了这个解决方案;))

(just to add an answer with code, knowing that you mentioned this solution in your own comment ;) )

为什么没有缓存是不可能的:Iterator 是实现接口的东西,没有一个要求,Iterator 对象实际上存储值.迭代两次,要么必须重置迭代器(不可能),要么克隆它(再次:不可能).

why it's impossible without caching: an Iterator is something that implements an interface and there is not a single requirement, that the Iterator object actually stores values. Do iterate twice you either have to reset the iterator (not possible) or clone it (again: not possible).

举一个迭代器的例子,其中克隆/重置没有任何意义:

To give an example for an iterator where cloning/resetting wouldn't make any sense:

public class Randoms implements Iterator<Double> {

  private int counter = 10;

  @Override 
  public boolean hasNext() { 
     return counter > 0; 
  }

  @Override 
  public boolean next() { 
     count--;
     return Math.random();        
  }      

  @Override 
  public boolean remove() { 
     throw new UnsupportedOperationException("delete not supported"); 
  }
}

这篇关于对值迭代两次 (MapReduce)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆