mapreduce计数差异 [英] mapreduce difference in count

查看:113
本文介绍了mapreduce计数差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个程序,输出2列中的计数之间的差异。所以我的数据如下所示:

  2,1 
2,3
1,2
3,1
4,2

我想计算键的出现次数col1和col2中键的出现并采取区别。输出应该如下所示:

  1,-1 
2,0
3,0
4,1

可以在一个mapreduce过程(mapper,reducer)中完成吗? / p>

解决方案

在每一行的映射器中,您将创建两个键,一个用于col1,另一个用于col2,其中值从每列计数,如下所示:

2,1 - > 2:{1,0}和1:{0,1}

2,3:2:{1,0}和3:{0,1}

1,2 - > 1:{1,0}和2:{0,1}

3,1 - > 3:{1,0}和1:{0,1}

4,2 - > 4:{1,0}和2:{0,1}

然后在reducer中,您将获得这些结果,其中每行是每个 reduce 调用的键和值组合:



1 - > {0,1 },{1,0},{0,1}(加它们会产生-1)

2 - > {1,0},2:{1, 0},2:{0,1},2:{0,1}(加它们会产生0)

3 - > {0,1},{ 1,0}(加它们会产生0)

4 - > {1,0}(加它们会产生1)

更新



以下是Hadoop示例(未经测试,可能需要进行一些调整才能使其运行):

 < 
保护无效映射(LongWritable偏移量,文本值,上下文上下文)
抛出IOException,InterruptedException {code $> public class TheMapper扩展映射器< LongWritable,Text,Text,ArrayPrimitiveWritable>

StringTokenizer tok = new StringTokenizer(value.toString(),,);

Text col1 = new Text(tok.nextToken());
context.write(col1,toArray(1,0));

Text col2 = new Text(tok.nextToken());
context.write(col2,toArray(0,1));
}

private ArrayPrimitiveWritable toArray(int v1,int v2){
return new ArrayPrimitiveWritable(new int [] {i1,i2});
}
}

public class TheReducer扩展了Reducer< Text,ArrayPrimitiveWritable,Text,Text> {
$ b $ public void reduce(Text key,Iterable< ArrayPrimitiveWritable> values,Context context)
throws IOException,InterruptedException {

Iterator< ArrayPrimitiveWritable>我= values.iterator();
int count = 0; (i.hasNext()){
int [] counts =(int [])i.next()。get();
count + = counts [0];
count - = counts [1];


context.write(key,new Text(+ count));
}
}


I'm trying to write a program that outputs the differences between counts in 2 columns. So my data looks like this:

2,1
2,3
1,2
3,1
4,2

I want to count the occurrences of key in col1 and the occurrences of keys in col2 and take the difference. The output should look like this:

1,-1
2,0
3,0
4,1

can this be done in one mapreduce procedure(mapper,reducer)?

解决方案

In mapper for each line you will create two keys, one for col1 and another for col2 where values are count from each columns, like so:

2,1 -> 2:{1, 0} and 1:{0, 1}

2,3 -> 2:{1, 0} and 3:{0, 1}

1,2 -> 1:{1, 0} and 2:{0, 1}

3,1 -> 3:{1, 0} and 1:{0, 1}

4,2 -> 4:{1, 0} and 2:{0, 1}

Then in reducer you will get these results where each line is the key and values combination for each reduce call:

1 -> {0, 1}, {1, 0}, {0, 1} (adding them will produce -1)

2 -> {1, 0}, 2:{1, 0}, 2:{0, 1}, 2:{0, 1} (adding them will produce 0)

3 -> {0, 1}, {1, 0} (adding them will produce 0)

4 -> {1, 0} (adding them will produce 1)

Update:

Here is Hadoop example (it is not tested and might require some tweaking to get it working):

public class TheMapper extends Mapper<LongWritable, Text, Text, ArrayPrimitiveWritable>{        

    protected void map(LongWritable offset, Text value, Context context) 
    throws IOException, InterruptedException {

        StringTokenizer tok = new StringTokenizer( value.toString(), "," );

        Text col1 = new Text( tok.nextToken() );
        context.write( col1, toArray(1, 0) );

        Text col2 = new Text( tok.nextToken() );        
        context.write( col2, toArray(0, 1) );
    }

    private ArrayPrimitiveWritable toArray(int v1, int v2){     
        return new ArrayPrimitiveWritable( new int[]{i1, i2} );
    }   
}

public class TheReducer extends Reducer<Text, ArrayPrimitiveWritable, Text, Text> {

  public void reduce(Text key, Iterable<ArrayPrimitiveWritable> values, Context context) 
  throws IOException, InterruptedException {

      Iterator<ArrayPrimitiveWritable> i = values.iterator();
      int count = 0;
      while ( i.hasNext() ){
          int[] counts = (int[])i.next().get();
          count += counts[0];
          count -= counts[1];
      }

      context.write( key, new Text("" + count) );
  }
}

这篇关于mapreduce计数差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆