如何在hadoop中实现排序? [英] How to implement sort in hadoop?

查看:295
本文介绍了如何在hadoop中实现排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是排序文件中的值。
键和值是整数,并且需要维护排序值的键。

 键值
1 24
3 4
4 12
5 23

输出:

  1 24 
5 23
4 12
3 4
code>

我正在处理海量数据,并且必须在hadoop机器集群中运行代码。
我怎么用mapreduce做这件事?

解决方案

你可以这样做(假设你正在使用Java在这里)



从地图发出像这样 -

 上下文。写(24,1); 
context.write(4,3);
context.write(12,4)
context.write(23,5)



因此,所有需要排序的值都应该是您的mapreduce作业中的关键。
Hadoop默认按键的升序进行排序。



因此,要么按降序排序,要么

  job.setSortComparatorClass(LongWritable.DecreasingComparator.class); 

或者,这个,

您需要设置一个自定义的降序排序比较器,这在你的工作中就像这样。

public static class DescendingKeyComparator extends WritableComparator {
protected DescendingKeyComparator(){
super(Text.class,true);

$ b @SuppressWarnings(rawtypes)
@Override
public int compare(WritableComparable w1,WritableComparable w2){
LongWritable key1 =(LongWritable )w1;
LongWritable key2 =(LongWritable)w2;
return -1 * key1.compareTo(key2);


$ / code>

Hadoop中的resle和sort阶段将处理按降序排序你的钥匙24,4,12,23



之后评论:



如果您需要降序IntWritable Comparable,您可以创建一个并像这样使用它 -

  job.setSortComparatorClass(DescendingIntComparable 。类); 

如果您使用 JobConf ,请使用此设置 p>

  jobConfObject.setOutputKeyComparatorClass(DescendingIntComparable.class); 

将以下代码放在 main() function -


$ b

  public static void main(String [] args){
int exitCode = ToolRunner.run(new YourDriver(),args);
System.exit(exitCode);
}

//这个类在main之外定义而不在
之内定义public static class DescendingIntWritableComparable extends IntWritable {
/ **一个针对IntWritable进行优化的递减比较器。 * /
public static class DecreasingComparator extends Comparator {
public int compare(WritableComparable a,WritableComparable b){
return -super.compare(a,b);

public int compare(byte [] b1,int s1,int l1,byte [] b2,int s2,int l2){
return -super.compare(b1,s1, l1,b2,s2,l2);
}
}
}


My problem is sorting values in a file. keys and values are integers and need to maintain the keys of sorted values.

key   value
1     24
3     4
4     12
5     23

output:

1     24
5     23
4     12
3     4

I am working with massive data and must run the code in a cluster of hadoop machines. How can i do it with mapreduce?

解决方案

You can probably do this (I'm assuming you are using Java here)

From maps emit like this -

context.write(24,1);
context.write(4,3);
context.write(12,4)
context.write(23,5)

So, all you values that needs to be sorted should be the key in your mapreduce job. Hadoop by default sorts by ascending order of key.

Hence, either you do this to sort in descending order,

job.setSortComparatorClass(LongWritable.DecreasingComparator.class);

Or, this,

You need to set a custom Descending Sort Comparator, which goes something like this in your job.

public static class DescendingKeyComparator extends WritableComparator {
    protected DescendingKeyComparator() {
        super(Text.class, true);
    }

    @SuppressWarnings("rawtypes")
    @Override
    public int compare(WritableComparable w1, WritableComparable w2) {
        LongWritable key1 = (LongWritable) w1;
        LongWritable key2 = (LongWritable) w2;          
        return -1 * key1.compareTo(key2);
    }
}

The suffle and sort phase in Hadoop will take care of sorting your keys in descending order 24,4,12,23

After comment:

If you require a Descending IntWritable Comparable, you can create one and use it like this -

job.setSortComparatorClass(DescendingIntComparable.class);

In case if you are using JobConf, use this to set

jobConfObject.setOutputKeyComparatorClass(DescendingIntComparable.class);

Put the following code below your main() function -

public static void main(String[] args) {
    int exitCode = ToolRunner.run(new YourDriver(), args);
    System.exit(exitCode);
}

//this class is defined outside of main not inside
public static class DescendingIntWritableComparable extends IntWritable {
    /** A decreasing Comparator optimized for IntWritable. */ 
    public static class DecreasingComparator extends Comparator {
        public int compare(WritableComparable a, WritableComparable b) {
            return -super.compare(a, b);
        }
        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
            return -super.compare(b1, s1, l1, b2, s2, l2);
        }
    }
}

这篇关于如何在hadoop中实现排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆