为自定义 Hadoop 类型实现 ArrayWritable [英] Implementation of an ArrayWritable for a custom Hadoop type

查看:16
本文介绍了为自定义 Hadoop 类型实现 ArrayWritable的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何为自定义 Hadoop 类型定义 ArrayWritable?我正在尝试在 Hadoop 中实现倒排索引,并使用自定义 Hadoop 类型来存储数据

How do I define an ArrayWritable for a custom Hadoop type ? I am trying to implement an inverted index in Hadoop, with custom Hadoop types to store the data

我有一个 Individual Posting 类,它存储术语频率、文档 ID 和文档中术语的字节偏移列表.

I have an Individual Posting class which stores the term frequency, document id and list of byte offsets for the term in the document.

我有一个 Posting 类,它有一个文档频率(该术语出现的文档数量)和个人帖子列表

I have a Posting class which has a document frequency (number of documents the term appears in) and list of Individual Postings

我已经为 IndividualPostings

当我为 IndividualPosting 定义自定义 ArrayWritable 时,我在本地部署后遇到了一些问题(使用 Karmasphere、Eclipse).

When i defined a custom ArrayWritable for IndividualPosting I encountered some problems after local deployment (using Karmasphere, Eclipse).

Posting 类列表中的所有 IndividualPosting 实例都是相同的,即使我在 Reduce 方法中得到不同的值

All the IndividualPosting instances in the list in Posting class would be the same, even though I get different values in the Reduce method

推荐答案

来自ArrayWritable:

包含类实例的数组的可写对象.这个可写的元素必须都是同一个类的实例.如果这个 writable 将成为 Reducer 的输入,您将需要创建一个子类,将值设置为正确的类型.例如: public class IntArrayWritable extends ArrayWritable { public IntArrayWritable() { super(IntWritable.class);} }

A Writable for arrays containing instances of a class. The elements of this writable must all be instances of the same class. If this writable will be the input for a Reducer, you will need to create a subclass that sets the value to be of the proper type. For example: public class IntArrayWritable extends ArrayWritable { public IntArrayWritable() { super(IntWritable.class); } }

您已经引用了 WritableComparable 类型由 Hadoop 定义.这是我假设您的实现对于 LongWritable:

You've already cited doing this with a WritableComparable type defined by Hadoop. Here's what I assume your implementation looks like for LongWritable:

public static class LongArrayWritable extends ArrayWritable
{
    public LongArrayWritable() {
        super(LongWritable.class);
    }
    public LongArrayWritable(LongWritable[] values) {
        super(LongWritable.class, values);
    }
}

您应该能够使用实现 WritableComparable,由 文档.使用他们的例子:

You should be able to do this with any type that implements WritableComparable, as given by the documentation. Using their example:

public class MyWritableComparable implements
        WritableComparable<MyWritableComparable> {

    // Some data
    private int counter;
    private long timestamp;

    public void write(DataOutput out) throws IOException {
        out.writeInt(counter);
        out.writeLong(timestamp);
    }

    public void readFields(DataInput in) throws IOException {
        counter = in.readInt();
        timestamp = in.readLong();
    }

    public int compareTo(MyWritableComparable other) {
        int thisValue = this.counter;
        int thatValue = other.counter;
        return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
    }
}

应该就是这样.这假设您使用的是 Hadoop API 的修订版 0.20.20.21.0.

And that should be that. This assumes you're using revision 0.20.2 or 0.21.0 of the Hadoop API.

这篇关于为自定义 Hadoop 类型实现 ArrayWritable的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆