与MapReduce的排列组合 [英] Permutations with MapReduce

查看:161
本文介绍了与MapReduce的排列组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有一种方法可以用MapReduce生成排列?



输入文件:

  1 title1 
2 title2
3 title3

我的目标:

  1,2 title1,title2 
1,3 title1,title3
2,3 title2, title3


解决方案

> n 输入,排列应该有 n ^ 2 输出。有意义的是,您可以使 n 任务执行 n 这些操作。我相信你可以做到这一点(假设只有一个文件):



把你的输入文件放到 DistributedCache 可作为Mapper / Reducers的只读访问。在文件的每一行上进行输入拆分(如在WordCount中)。这样mapper就会收到一行(例如在你的例子中 title1 )。然后从DistributedCache中的文件中读取行,然后发出键/值对:将键用作输入,将值作为DistributedCache文件中的每行。

<在这个模型中,你只需要一个Map步骤。



类似于:

  public static class PermuteMapper 
extends Mapper< Object,Text,Text,Text> {

private static final IN_FILENAME =file.txt;

public void map(Object key,Text value,Context context
)throws IOException,InterruptedException {

String inputLine = value.toString();

//在你的
//配置文件中设置属性mapred.cache.files使文件可用
Path [] cachedPaths = DistributedCache.getLocalCacheArchives(conf);
if(cachedPaths [0] .getName()。equals(IN_FILENAME)){
//在其他地方定义的函数
String [] cachedLines = getLinesFromPath(cachedPaths [0]);
for(String line:cachedLines)
context.emit(inputLine,line);
}
}
}


Is there a way to generate permutations with MapReduce?

input file:

1  title1
2  title2
3  title3

my goal:

1,2  title1,title2
1,3  title1,title3
2,3  title2,title3

解决方案

Since a file will have n inputs, the permutations should have n^2 outputs. It makes sense that you could have n tasks perform n of those operations. I believe you could do this (assuming only for one file):

Put your input file into the DistributedCache to be accessible as read-only to your Mapper/Reducers. Make an input split on each line of the file (like in WordCount). The mapper will thus recieve one line (e.g. title1 in your example). Then read the lines out of the file in the DistributedCache and emit your key/value pairs: with the key as your input and the values as each line from the file from DistributedCache.

In this model, you should only need a Map step.

Something like:

  public static class PermuteMapper
       extends Mapper<Object, Text, Text, Text>{

    private static final IN_FILENAME="file.txt";

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {

      String inputLine = value.toString();

      // set the property mapred.cache.files in your
      // configuration for the file to be available
      Path[] cachedPaths = DistributedCache.getLocalCacheArchives(conf);
      if ( cachedPaths[0].getName().equals(IN_FILENAME) ) {
         // function defined elsewhere
         String[] cachedLines = getLinesFromPath(cachedPaths[0]);
         for (String line : cachedLines)
           context.emit(inputLine, line);
      }
    }
  }

这篇关于与MapReduce的排列组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆