为自定义Hadoop类型实现ArrayWritable [英] Implementation of an ArrayWritable for a custom Hadoop type

查看:154
本文介绍了为自定义Hadoop类型实现ArrayWritable的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何为自定义Hadoop类型定义ArrayWritable?我试图在Hadoop中实现一个倒排索引,使用自定义Hadoop类型来存储数据。



我有一个个人发布类,它存储文档ID和文档字词偏移量列表。

我有一个发布类,它具有文档频率(这个词出现在文档的数量)和单独发布的列表

我已经定义了一个LongArrayWritable,它扩展了ArrayWritable类的字节偏移量列表 IndividualPostings



当我为PersonalPosting定义一个自定义ArrayWritable时,我在本地部署(使用Karmasphere,Eclipse)后遇到了一些问题。



Posting类中列表中的所有 IndividualPosting 实例将是相同的,即使我在Reduce方法中获得不同的值

解决方案

ArrayWritable


一个包含一个类的实例的数组的Writable。这个可写的元素必须都是同一个类的实例。如果这个可写入将是Reducer的输入,则需要创建一个将该值设置为正确类型的子类。例如: public class IntArrayWritable extends ArrayWritable {public IntArrayWritable(){super(IntWritable.class); }}


您已经引用 WritableComparable 类型由Hadoop定义。以下是我认为您的实现适用于 LongWritable

  public static class LongArrayWritable extends ArrayWritable 
{
public LongArrayWritable(){
super(LongWritable.class);
}
public LongArrayWritable(LongWritable [] values){
super(LongWritable.class,values);




$ b你应该可以用任何类型实现 WritableComparable ,如文档。使用他们的例子:

  public class MyWritableComparable implements 
WritableComparable< MyWritableComparable> {

//一些数据
private int counter;
私人长时间戳;

public void write(DataOutput out)throws IOException {
out.writeInt(counter);
out.writeLong(timestamp);


public void readFields(DataInput in)throws IOException {
counter = in.readInt();
timestamp = in.readLong();
}

public int compareTo(MyWritableComparable other){
int thisValue = this.counter;
int thatValue = other.counter;
return(thisValue< thatValue?-1:(thisValue == thatValue?0:1));
}
}

那应该是这样的。这假定您正在使用Hadoop API的修订 0.20.2 0.21.0


How do I define an ArrayWritable for a custom Hadoop type ? I am trying to implement an inverted index in Hadoop, with custom Hadoop types to store the data

I have an Individual Posting class which stores the term frequency, document id and list of byte offsets for the term in the document.

I have a Posting class which has a document frequency (number of documents the term appears in) and list of Individual Postings

I have defined a LongArrayWritable extending the ArrayWritable class for the list of byte offsets in IndividualPostings

When i defined a custom ArrayWritable for IndividualPosting I encountered some problems after local deployment (using Karmasphere, Eclipse).

All the IndividualPosting instances in the list in Posting class would be the same, even though I get different values in the Reduce method

解决方案

From the documentation of ArrayWritable:

A Writable for arrays containing instances of a class. The elements of this writable must all be instances of the same class. If this writable will be the input for a Reducer, you will need to create a subclass that sets the value to be of the proper type. For example: public class IntArrayWritable extends ArrayWritable { public IntArrayWritable() { super(IntWritable.class); } }

You've already cited doing this with a WritableComparable type defined by Hadoop. Here's what I assume your implementation looks like for LongWritable:

public static class LongArrayWritable extends ArrayWritable
{
    public LongArrayWritable() {
        super(LongWritable.class);
    }
    public LongArrayWritable(LongWritable[] values) {
        super(LongWritable.class, values);
    }
}

You should be able to do this with any type that implements WritableComparable, as given by the documentation. Using their example:

public class MyWritableComparable implements
        WritableComparable<MyWritableComparable> {

    // Some data
    private int counter;
    private long timestamp;

    public void write(DataOutput out) throws IOException {
        out.writeInt(counter);
        out.writeLong(timestamp);
    }

    public void readFields(DataInput in) throws IOException {
        counter = in.readInt();
        timestamp = in.readLong();
    }

    public int compareTo(MyWritableComparable other) {
        int thisValue = this.counter;
        int thatValue = other.counter;
        return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
    }
}

And that should be that. This assumes you're using revision 0.20.2 or 0.21.0 of the Hadoop API.

这篇关于为自定义Hadoop类型实现ArrayWritable的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆