自定义hadoop键和值:如何编写CompareTo()方法 [英] Custom hadoop key and value : How to write CompareTo() Method

查看:272
本文介绍了自定义hadoop键和值:如何编写CompareTo()方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要发射一个2D double数组作为mapper的键和值。有堆栈溢出发布的问题,但他们没有得到答案。



我正在做一些给定的数据集中的矩阵乘法,然后我需要发出 A * Atrns 的值,它将是一个矩阵作为关键字并且 Atrans * D 这也将是一个矩阵作为值。那么如何从映射器发射这些矩阵。并且该值应该对应于该键本身。

  ie key -----> A * Atrans --------->在乘法之后,结果将是一个2D数组,它被声明为double(矩阵),可以说结果是MatrixEkey(double [] [] Ekey)

值------> Atrans * D --------->乘法后的结果将是MatrixEval(double [] [] Eval)。

之后,我需要将这些矩阵发送给reducer以供进一步计算。

所以在映射器中:
context.write(Ekey,Eval);

Reducer:
我需要用这些Ekey和Eval进一步计算。

我写过我的课程:

UPDATE

  public class MatrixWritable implements WritableComparable< MatrixWritable> {

/ * *
* @param args
* /
private double [] [] value;
private double [] [] values;
public MatrixWritable(){
// TODO自动生成的构造函数存根

setValue(new double [0] [0]);


$ b $ public MatrixWritable(double [] [] value){
// TODO自动生成构造函数存根

this。值=值;
}

public void setValue(double [] [] value){

this.value = value;

}

public double [] [] getValue(){
返回值;
}

@Override
public void write(DataOutput out)throws IOException {
out.writeInt(value.length); //写入数值
for(int i = 0; i< value.length; i ++){
out.writeInt(value [i] .length); (int i = 0; i< value.length; i ++){
for(int j = 0; j< value [i] .length; j ++){
}
out.writeDouble(value [i] [j]);




@Override
public void readFields(DataInput in)throws IOException {

value = new double [in.readInt()] [];
for(int i = 0; i< value.length; i ++){
value [i] = new double [in.readInt()];
}
values = new double [value.length] [value [0] .length]; (int j = 0; j< value [0] .length; j ++){
values [i(0);
for(int i = 0; i< value.length; i ++) ] [j] = in.readDouble();







@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + Arrays.hashCode(value);
返回结果;
}





/ *(非Javadoc)
* @see java.lang.Object#equals (java.lang.Object)
* /
@Override
public boolean equals(Object obj){
if(this == obj){
return true;
}
if(obj == null){
return false;
}
if(!(obj instanceof MatrixWritable)){
return false;
}
MatrixWritable other =(MatrixWritable)obj;
if(!Arrays.deepEquals(value,other.value)){
return false;
}
返回true;
}


@Override
public int compareTo(MatrixWritable o){
// TODO自动生成的方法存根
return 0;


}

public String toString(){String separator =|;
StringBuffer result = new StringBuffer();

//迭代第一维
for(int i = 0; i< values.length; i ++){
//迭代第二维
for(int j = 0; j< values [i] .length; j ++){
result.append(values [i] [j]);

result.append(separator);
}
//删除最后一个分隔符
result.setLength(result.length() - separator.length());
//添加换行符。
result.append(,);
}


return result.toString();



}

}

我可以从映射器中以矩阵的形式发出一个值

  context.write(...,new MatrixWritable (AAtrans)); 

如何从mapper发射矩阵AtransD作为键?


$ b $为此,我需要编写compareto()方法,对吗?



该方法应包含哪些内容?

解决方案

首先,要实现自定义密钥,您必须实现 WritableComparable 。要实现自定义值,您必须实现 Writable 。在许多情况下,因为能够交换键和值是很方便的,所以大多数人将所有自定义类型写为 WritableComparable



这是 Hadoop:权威指南部分的一个链接,它涵盖了写入 WritableComparable 的部分。 编写自定义书写



写出一个数组的技巧是在读取方面,您需要知道要读取多少个元素。所以基本的模式是...

 写入时:
写入元素数量
写入每个元素元素


阅读:
读取元素数量(n)
创建适当大小的数组
读取0 - (n-1 )元素和填充数组

更新

您应该在默认构造函数中将您的数组实例化为空,以防止稍后发生NullPointerException。您的实现的问题在于它假定每个内部阵列具有相同的长度。如果这是真的,则不需要多次计算列长度。如果为false,则需要在写入行的值之前写入每行的长度。



我会建议如下所示:

  context.write(row); //如上所计算的
for(int i = 0; i< row; i ++){
double [] rowVals = array [row];
context.write(rowVals.length);
for(int j = 0; j context.write(rowVals [j]);
}


I need to emit a 2D double array as key and value from mapper. There are questions posted in Stack Overflow, but they are not answered.

I am doing some of the matrix multiplication in a given dataset, and after that I need to emit the value of A*Atrns which will be a matrix as key and Atrans*D which will also be a matrix as value. So how to emit these matrices from mapper. And the value should be corresponding to the key itself.

ie key ----->  A*Atrans--------->after multiplication the result will be a 2D array which is declared as double (matrix) lets say the result be Matrix "Ekey"(double[][] Ekey)

value ------>  Atrans*D ---------> after multiplication the result will be Matrix "Eval" (double[][] Eval).

After that I need to emit these matrix to reducer for further calculations.

So in mapper: 
       context.write(Ekey,Eval);

Reducer:
      I need to do further calculations with these Ekey and Eval.

I wrote my class:

UPDATE

    public class MatrixWritable implements WritableComparable<MatrixWritable>{

/**
 * @param args
 */
    private double[][] value;
    private double[][] values;
    public MatrixWritable() {
    // TODO Auto-generated constructor stub

        setValue(new double[0][0]);
     }


    public MatrixWritable(double[][] value) {
    // TODO Auto-generated constructor stub

     this.value = value;
    }

    public void setValue(double[][] value) {

        this.value = value;

    }

    public double[][] getValue() {
        return values;
    }

    @Override
    public void write(DataOutput out) throws IOException {
    out.writeInt(value.length);                 // write values
     for (int i = 0; i < value.length; i++) {
       out.writeInt(value[i].length);
     }
     for (int i = 0; i < value.length; i++) {
       for (int j = 0; j < value[i].length; j++) {
           out.writeDouble(value[i][j]);
       }
     }

  }

    @Override
    public void readFields(DataInput in) throws IOException {

        value = new double[in.readInt()][];          
        for (int i = 0; i < value.length; i++) {
          value[i] = new double[in.readInt()];
        }
        values = new double[value.length][value[0].length];
      for(int i=0;i<value.length ; i++){
            for(int j= 0 ; j< value[0].length;j++){
                values[i][j] = in.readDouble();

            }
        }

  }



@Override
public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + Arrays.hashCode(value);
    return result;
}





/* (non-Javadoc)
 * @see java.lang.Object#equals(java.lang.Object)
 */
@Override
public boolean equals(Object obj) {
    if (this == obj) {
        return true;
    }
    if (obj == null) {
        return false;
    }
    if (!(obj instanceof MatrixWritable)) {
        return false;
    }
    MatrixWritable other = (MatrixWritable) obj;
    if (!Arrays.deepEquals(value, other.value)) {
        return false;
    }
    return true;
}


    @Override
    public int compareTo(MatrixWritable o) {
    // TODO Auto-generated method stub
    return 0;


    }

    public String toString() { String separator = "|";
        StringBuffer result = new StringBuffer();

        // iterate over the first dimension
        for (int i = 0; i < values.length; i++) {
            // iterate over the second dimension
            for(int j = 0; j < values[i].length; j++){
                result.append(values[i][j]);

                result.append(separator);
            }
            // remove the last separator
            result.setLength(result.length() - separator.length());
            // add a line break.
            result.append(",");
        }


        return result.toString();



  }

}

I am able to emit a value as matrix from mapper

context.write(...,new MatrixWritable(AAtrans));

How to emit matrix AtransD as key from mapper?

For that I need to write compareto() method, right?

What should be included in that method?

解决方案

First, to implement a custom key you must implement WritableComparable. To implement a custom value you must implement Writable. In many cases since it is handy to be able to swap keys and values most people write all custom types as WritableComparable.

Here is a link to the section of Hadoop: The Definitive Guide that covers writing a WritableComparable. Writing A Custom Writable

The trick with writing out an array is that on the read side you need to know how many elements to read. So the basic patter is...

On write:
write the number of elements
write each element


On read:
read the number of elements (n)
create an array of the appropriate size
read 0 - (n-1) elements and populate array

Update

You should instantiate your array as empty in the default constructor to prevent a NullPointerException later.

The problem with your implementation is that it assumes that each inner array is of the same length. If that is true, you don't need to calculate the column length more than once. If false, you need to write the length of each row before writing the values of the row.

I would suggest something like this:

 context.write(row); // as calculated above
 for (int i=0; i<row; i++){
     double[] rowVals = array[row];
     context.write(rowVals.length);
     for (int j=0; j<rowVals.length; j++)
         context.write(rowVals[j]);
 }

这篇关于自定义hadoop键和值:如何编写CompareTo()方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆