对值迭代两次 (MapReduce) [英] Iterate twice on values (MapReduce)
问题描述
我收到一个迭代器作为参数,我想对值进行两次迭代.
I receive an iterator as argument and I would like to iterate on values twice.
public void reduce(Pair<String,String> key, Iterator<IntWritable> values,
Context context)
有可能吗?如何 ?签名是由我使用的框架(即 Hadoop)强加的.
Is it possible ? How ? The signature is imposed by the framework I am using (namely Hadoop).
-- 编辑--
最后,reduce
方法的真正签名是带有 iterable
.我被这个 wiki 页面误导了(这实际上是唯一一个未被弃用(但错误)的例子我发现的字数).
-- edit --
Finally the real signature of the reduce
method is with an iterable
. I was misled by this wiki page (which is actually the only non-deprecated (but wrong) example of wordcount I found).
推荐答案
如果您想再次迭代,我们必须缓存迭代器中的值.至少我们可以结合第一次迭代和缓存:
We have to cache the values from the iterator if you want to iterate again. At least we can combine the first iteration and the caching:
Iterator<IntWritable> it = getIterator();
List<IntWritable> cache = new ArrayList<IntWritable>();
// first loop and caching
while (it.hasNext()) {
IntWritable value = it.next();
doSomethingWithValue();
cache.add(value);
}
// second loop
for(IntWritable value:cache) {
doSomethingElseThatCantBeDoneInFirstLoop(value);
}
(只是用代码添加答案,知道您在自己的评论中提到了这个解决方案;))
(just to add an answer with code, knowing that you mentioned this solution in your own comment ;) )
为什么没有缓存是不可能的:Iterator
是实现接口的东西,没有一个要求,Iterator
对象实际上存储值.迭代两次,要么必须重置迭代器(不可能),要么克隆它(再次:不可能).
why it's impossible without caching: an Iterator
is something that implements an interface and there is not a single requirement, that the Iterator
object actually stores values. Do iterate twice you either have to reset the iterator (not possible) or clone it (again: not possible).
举一个迭代器的例子,其中克隆/重置没有任何意义:
To give an example for an iterator where cloning/resetting wouldn't make any sense:
public class Randoms implements Iterator<Double> {
private int counter = 10;
@Override
public boolean hasNext() {
return counter > 0;
}
@Override
public boolean next() {
count--;
return Math.random();
}
@Override
public boolean remove() {
throw new UnsupportedOperationException("delete not supported");
}
}
这篇关于对值迭代两次 (MapReduce)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!