Hadoop映射器可以在输出中生成多个键吗? [英] Can Hadoop mapper produce multiple keys in output?

查看:100
本文介绍了Hadoop映射器可以在输出中生成多个键吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

单个Mapper类可以在一次运行中产生多个键值对(相同类型)吗?



我们在映射器中输出键 - 值对这:

  context.write(key,value); 

这是一个修剪下来(并且是例证)的版本:

  import java.io.DataInput; 
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.ObjectWritable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;


公共类MyKey扩展ObjectWritable实现WritableComparable< MyKey> {

public enum KeyType {
KeyType1,
KeyType2
}

private KeyType keyTupe;
私人长场1;
private Integer field2 = -1;
private String field3 =;


public KeyType getKeyType(){
return keyTupe;
}

public void settKeyType(KeyType keyType){
this.keyTupe = keyType;
}

public Long getField1(){
return field1;
}

public void setField1(Long field1){
this.field1 = field1;
}

public Integer getField2(){
return field2;
}

public void setField2(Integer field2){
this.field2 = field2;
}


public String getField3(){
return field3;
}

public void setField3(String field3){
this.field3 = field3;

$ b @Override
public void readFields(DataInput datainput)throws IOException {
keyTupe = KeyType.valueOf(datainput.readUTF());
field1 = datainput.readLong();
field2 = datainput.readInt();
field3 = datainput.readUTF();
}

@Override
public void write(DataOutput dataoutput)throws IOException {
dataoutput.writeUTF(keyTupe.toString());
dataoutput.writeLong(field1);
dataoutput.writeInt(field2);
dataoutput.writeUTF(field3);
}

@Override
public int compareTo(MyKey other){
if(getKeyType()。compareTo(other.getKeyType())!= 0){
return getKeyType()。compareTo(other.getKeyType());
} else if(getField1()。compareTo(other.getField1())!= 0){
return getField1()。compareTo(other.getField1());
} else if(getField2()。compareTo(other.getField2())!= 0){
return getField2()。compareTo(other.getField2());
} else if(getField3()。compareTo(other.getField3())!= 0){
return getField3()。compareTo(other.getField3());
} else {
return 0;



public static class MyKeyComparator extends WritableComparator {
public MyKeyComparator(){
super(MyKey.class);
}

public int compare(byte [] b1,int s1,int l1,byte [] b2,int s2,int l2){
return compareBytes(b1,s1 ,l1,b2,s2,l2);



static {//注册这个比较器
WritableComparator.define(MyKey.class,new MyKeyComparator());






$ b这是我们如何尝试输出两个键Mapper:

  MyKey key1 = new MyKey(); 
key1.settKeyType(KeyType.KeyType1);
key1.setField1(1L);
key1.setField2(23);

MyKey key2 = new MyKey();
key2.settKeyType(KeyType.KeyType2);
key2.setField1(1L);
key2.setField3(abc);

context.write(key1,value1);
context.write(key2,value2);

我们的作业的输出格式类是:org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat



我这样说是因为在其他输出格式类中,我看到输出没有附加,只是在实现write方法时提交。



另外,我们对Mapper和Context使用以下类:
org.apache.hadoop.mapreduce.Mapper
org.apache.hadoop.mapreduce.Context

解决方案

在一个地图任务中多次写入上下文是完全正确的。

但是,您的关键班级可能有几个问题。每当你为一个键实现 WritableComparable 时,你还应该实现 equals(Object) hashCode ()方法。这些不是WritableComparable接口的一部分,因为它们在 Object 中定义,但是您必须提供实现。



默认分区程序使用 hashCode()方法来确定每个键/值对所使用的缩减器。作为经验法则,无论何时执行 hashCode() / code>或任何类型的比较方法,你应该提供一个 equals(Object)方法。您必须确保它接受 Object 作为参数,因为这是在 Object 中定义的方式。类(它的实现可能是压倒性优点)。


Can a single Mapper class produce multiple key-value pairs (of same type) in a single run?

We output the key-value pair in the mapper like this:

context.write(key, value);

Here's a trimmed down (and exemplified) version of the Key:

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.ObjectWritable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;


public class MyKey extends ObjectWritable implements WritableComparable<MyKey> {

    public enum KeyType {
        KeyType1,
        KeyType2
    }

    private KeyType keyTupe;
    private Long field1;
    private Integer field2 = -1;
    private String field3 = "";


    public KeyType getKeyType() {
        return keyTupe;
    }

    public void settKeyType(KeyType keyType) {
        this.keyTupe = keyType;
    }

    public Long getField1() {
        return field1;
    }

    public void setField1(Long field1) {
        this.field1 = field1;
    }

    public Integer getField2() {
        return field2;
    }

    public void setField2(Integer field2) {
        this.field2 = field2;
    }


    public String getField3() {
        return field3;
    }

    public void setField3(String field3) {
        this.field3 = field3;
    }

    @Override
    public void readFields(DataInput datainput) throws IOException {
        keyTupe = KeyType.valueOf(datainput.readUTF());
        field1 = datainput.readLong();
        field2 = datainput.readInt();
        field3 = datainput.readUTF();
    }

    @Override
    public void write(DataOutput dataoutput) throws IOException {
        dataoutput.writeUTF(keyTupe.toString());
        dataoutput.writeLong(field1);
        dataoutput.writeInt(field2);
        dataoutput.writeUTF(field3);
    }

    @Override
    public int compareTo(MyKey other) {
        if (getKeyType().compareTo(other.getKeyType()) != 0) {
            return getKeyType().compareTo(other.getKeyType());
        } else if (getField1().compareTo(other.getField1()) != 0) {
            return getField1().compareTo(other.getField1());
        } else if (getField2().compareTo(other.getField2()) != 0) {
            return getField2().compareTo(other.getField2());
        } else if (getField3().compareTo(other.getField3()) != 0) {
            return getField3().compareTo(other.getField3());
        } else {
            return 0;
        }
    }

    public static class MyKeyComparator extends WritableComparator {
        public MyKeyComparator() {
            super(MyKey.class);
        }

        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
            return compareBytes(b1, s1, l1, b2, s2, l2);
        }
    }

    static { // register this comparator
        WritableComparator.define(MyKey.class, new MyKeyComparator());
    }
}

And this is how we tried to output both keys in the Mapper:

MyKey key1 = new MyKey();
key1.settKeyType(KeyType.KeyType1);
key1.setField1(1L);
key1.setField2(23);

MyKey key2 = new MyKey();
key2.settKeyType(KeyType.KeyType2);
key2.setField1(1L);
key2.setField3("abc");

context.write(key1, value1);
context.write(key2, value2);

Our job's output format class is: org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat

I'm stating this because in other output format classes I've seen the output not appending and just committing in their implementation of write method.

Also, we are using the following classes for Mapper and Context: org.apache.hadoop.mapreduce.Mapper org.apache.hadoop.mapreduce.Context

解决方案

Writing to the context multiple times in one map task is perfectly fine.

However, you may have several problems with your key class. Whenever you implement WritableComparable for a key, you should also implement equals(Object) and hashCode() methods. These aren't part of the WritableComparable interface, since they are defined in Object, but you must provide implementations.

The default partitioner uses the hashCode() method to decide which reducer each key/value pair goes to. If you don't provide a sane implementation, you can get strange results.

As a rule of thumb, whenever you implement hashCode() or any sort of comparison method, you should provide an equals(Object) method as well. You will have to make sure it accepts an Object as the parameter, as this is how it is defined in the Object class (whose implementation you are probably overriding).

这篇关于Hadoop映射器可以在输出中生成多个键吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆