在值上迭代两次(MapReduce) [英] Iterate twice on values (MapReduce)

查看:95
本文介绍了在值上迭代两次(MapReduce)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  public void reduce(Pair< String 

我收到一个迭代器作为参数,我想迭代两次值。 ,String>键,Iterator< IntWritable>值,
上下文上下文)

?怎么样 ?
签名由我使用的框架(即Hadoop)强加。

- 编辑 -

最后,真实签名 reduce 方法的方法是使用 iterable 。我被这个 wiki页面误导了(这实际上是唯一一个不被弃用(但是错误)的例子

解决方案

如果您想再次迭代,我们必须缓存来自迭代器的值。至少我们可以结合第一次迭代和缓存:

  Iterator< IntWritable> it = getIterator(); 
列表< IntWritable> cache = new ArrayList< IntWritable>();

//第一次循环和缓存
while(it.hasNext()){
IntWritable value = it.next();
doSomethingWithValue();
cache.add(value);
}

//第二个循环
(IntWritable value:cache){
doSomethingElseThatCantBeDoneInFirstLoop(value);

$ / code>

(仅仅为代码添加一个答案,知道你提到了这个解决方案你自己的评论;))




为什么没有缓存是不可能的: Iterator 是实现一个接口的东西,并没有一个单独的要求, Iterator 对象实际上存储了值。迭代两次,你必须重置迭代器(不可能)或克隆它(再次:不可能)。



给一个迭代器的例子,克隆/重置没有任何意义:

  public class Randoms实现Iterator< Double> {

private int counter = 10;

@Override
public boolean hasNext(){
return counter> 0;
}

@Override
public boolean next(){
count--;
return Math.random();
}

@Override
public boolean remove(){
throw new UnsupportedOperationException(delete not supported);
}
}


I receive an iterator as argument and I would like to iterate on values twice.

public void reduce(Pair<String,String> key, Iterator<IntWritable> values,
                   Context context)

Is it possible ? How ? The signature is imposed by the framework I am using (namely Hadoop).

-- edit --
Finally the real signature of the reduce method is with an iterable. I was misled by this wiki page (which is actually the only non-deprecated (but wrong) example of wordcount I found).

解决方案

We have to cache the values from the iterator if you want to iterate again. At least we can combine the first iteration and the caching:

Iterator<IntWritable> it = getIterator();
List<IntWritable> cache = new ArrayList<IntWritable>();

// first loop and caching
while (it.hasNext()) {
   IntWritable value = it.next();
   doSomethingWithValue();
   cache.add(value);
}

// second loop
for(IntWritable value:cache) {
   doSomethingElseThatCantBeDoneInFirstLoop(value);
}

(just to add an answer with code, knowing that you mentioned this solution in your own comment ;) )


why it's impossible without caching: an Iterator is something that implements an interface and there is not a single requirement, that the Iterator object actually stores values. Do iterate twice you either have to reset the iterator (not possible) or clone it (again: not possible).

To give an example for an iterator where cloning/resetting wouldn't make any sense:

public class Randoms implements Iterator<Double> {

  private int counter = 10;

  @Override 
  public boolean hasNext() { 
     return counter > 0; 
  }

  @Override 
  public boolean next() { 
     count--;
     return Math.random();        
  }      

  @Override 
  public boolean remove() { 
     throw new UnsupportedOperationException("delete not supported"); 
  }
}

这篇关于在值上迭代两次(MapReduce)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆