Custom WritableCompare将对象引用显示为输出 [英] Custom WritableCompare displays object reference as output

查看:105
本文介绍了Custom WritableCompare将对象引用显示为输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Hadoop和Java的新手,我觉得有一些显而易见的缺点。如果这意味着什么,我正在使用Hadoop 1.0.3。

我使用hadoop的目标是每次读取一堆文件并解析一个文件(而不是逐行)。每个文件都会生成多个键值,但其他行的上下文非常重要。键和值是多值/复合,所以我已经为键实现了WritableCompare,并为值实现了Writable。因为每个文件的处理需要一些CPU,所以我想保存映射器的输出,然后再运行多个reducers。



对于组合键,I紧接着[http://stackoverflow.com/questions/12427090/hadoop-composite-key] [1]

问题是,输出只是Java对象引用与组合键和值相反。示例:
LinkKeyWritable @ bd2f9730 LinkValueWritable @ 8752408c



我不确定问题是否与完全不减少数据或

以下是我的主要类:

  public static void main(String [] args)throws Exception {
JobConf conf = new JobConf(Parser.class);
conf.setJobName(raw_parser);

conf.setOutputKeyClass(LinkKeyWritable.class);
conf.setOutputValueClass(LinkValueWritable.class);

conf.setMapperClass(RawMap.class);
conf.setNumMapTasks(0);

conf.setInputFormat(PerFileInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

PerFileInputFormat.setInputPaths(conf,new Path(args [0]));
FileOutputFormat.setOutputPath(conf,new Path(args [1]));

JobClient.runJob(conf);
}

以及我的Mapper类:

public class RawMap extends MapReduceBase implements
Mapper {

  public void map(NullWritable key,Text value, 
OutputCollector< LinkKeyWritable,LinkValueWritable>输出,
Reporter记者)抛出IOException {
String json = value.toString();
SerpyReader reader = new SerpyReader(json);
GoogleParser解析器=新的GoogleParser(reader);
for(String page:reader.getPages()){
String content = reader.readPageContent(page);
parser.addPage(content); (link link:parser.getLinks()){
LinkKeyWritable linkKey = new LinkKeyWritable(link);
}

LinkValueWritable linkValue = new LinkValueWritable(link);
output.collect(linkKey,linkValue);





$ b链接基本上是一个结构LinkKeyWritable和LinkValueWritable之间的各种信息



LinkKeyWritable:

 公共类LinkKeyWritable实现WritableComparable< LinkKeyWritable> {
受保护的链接链接;

public LinkKeyWritable(){
super();
link = new Link();
}

public LinkKeyWritable(链接链接){
super();
this.link = link;
}

@Override
public void readFields(DataInput in)throws IOException {
link.batchDay = in.readLong();
link.source = in.readUTF();
link.domain = in.readUTF();
link.path = in.readUTF();
}

@Override
public void write(DataOutput out)throws IOException {
out.writeLong(link.batchDay);
out.writeUTF(link.source);
out.writeUTF(link.domain);
out.writeUTF(link.path);
}

@Override
public int compareTo(LinkKeyWritable o){
return ComparisonChain.start()。
比较(link.batchDay,o.link.batchDay)。
比较(link.domain,o.link.domain)。
比较(link.path,o.link.path)。
result();


@Override
public int hashCode(){
return Objects.hashCode(link.batchDay,link.source,link.domain,link.path) ;

$ b @Override
public boolean equals(final Object obj){
if(obj instanceof LinkKeyWritable){
final LinkKeyWritable o =(LinkKeyWritable) OBJ;
返回Objects.equal(link.batchDay,o.link.batchDay)
&& Objects.equal(link.source,o.link.source)
&& Objects.equal(link.domain,o.link.domain)
&& Objects.equal(link.path,o.link.path);
}
返回false;


LinkValueWritable:

  public class LinkValueWritable implements Writable {
protected Link link;

public LinkValueWritable(){
link = new Link();
}

LinkValueWritable(链接链接){
this.link = new Link();
this.link.type = link.type;
this.link.description = link.description;
}

@Override
public void readFields(DataInput in)throws IOException {
link.type = in.readUTF();
link.description = in.readUTF();
}

@Override
public void write(DataOutput out)throws IOException {
out.writeUTF(link.type);
out.writeUTF(link.description);


@Override
public int hashCode(){
return Objects.hashCode(link.type,link.description);

$ b @Override
public boolean equals(final Object obj){
if(obj instanceof LinkKeyWritable){
final LinkKeyWritable o =(LinkKeyWritable) OBJ;
返回Objects.equal(link.type,o.link.type)
&& Objects.equal(link.description,o.link.description);
}
返回false;
}
}


解决方案

I认为答案是在执行 TextOutputFormat 。具体来说,LineRecordWriter的writeObject方法:

  / ** 
*将对象写入字节流,将Text处理为一个特殊的
*案例。
* @param要打印的对象
* @throws IOException如果写入抛出,我们将它传递给
* /
private void writeObject(Object o)throws IOException { b $ b if(o instanceof Text){
Text to =(Text)o;
out.write(to.getBytes(),0,to.getLength());
} else {
out.write(o.toString()。getBytes(utf8));


code


$ b如你所见,如果你的键或值是不是一个Text对象,它会调用它的toString方法并写出它。既然你已经没有在你的键和值中实现String,它使用的是Object类的实现,它正在写出引用。



我想说你应该尝试编写适当的toString函数或使用不同的OutputFormat。


I am new to Hadoop and Java, and I feel there is something obvious I am just missing. I am using Hadoop 1.0.3 if that means anything.

My goal for using hadoop is to take a bunch of files and parse them one file at a time (as opposed to line by line). Each file will produce multiple key-values, but context to the other lines is important. The key and value are multi-value/composite, so I have implemented WritableCompare for the key and Writable for the value. Because the processing of each file take a bit of CPU, I want to save the output of the mapper, then run multiple reducers later on.

For the composite keys, I followed [http://stackoverflow.com/questions/12427090/hadoop-composite-key][1]

The problem is, the output is just Java object references as opposed to the composite key and value. Example: LinkKeyWritable@bd2f9730 LinkValueWritable@8752408c

I am not sure if the problem is related to not reducing the data at all or

Here is my main class:

public static void main(String[] args) throws Exception {
  JobConf conf = new JobConf(Parser.class);
  conf.setJobName("raw_parser");

  conf.setOutputKeyClass(LinkKeyWritable.class);
  conf.setOutputValueClass(LinkValueWritable.class);

  conf.setMapperClass(RawMap.class);
  conf.setNumMapTasks(0);

  conf.setInputFormat(PerFileInputFormat.class);
  conf.setOutputFormat(TextOutputFormat.class);

  PerFileInputFormat.setInputPaths(conf, new Path(args[0]));
  FileOutputFormat.setOutputPath(conf, new Path(args[1]));

  JobClient.runJob(conf);
}

And my Mapper class:

public class RawMap extends MapReduceBase implements Mapper {

    public void map(NullWritable key, Text value,
            OutputCollector<LinkKeyWritable, LinkValueWritable> output,
            Reporter reporter) throws IOException {
        String json = value.toString();
        SerpyReader reader = new SerpyReader(json);
        GoogleParser parser = new GoogleParser(reader);
        for (String page : reader.getPages()) {
            String content = reader.readPageContent(page);
            parser.addPage(content);
        }
        for (Link link : parser.getLinks()) {
            LinkKeyWritable linkKey = new LinkKeyWritable(link);
            LinkValueWritable linkValue = new LinkValueWritable(link);
            output.collect(linkKey, linkValue);
        }
    }
}

Link is basically a struct of various information that get's split between LinkKeyWritable and LinkValueWritable

LinkKeyWritable:

public class LinkKeyWritable implements WritableComparable<LinkKeyWritable>{
    protected Link link;

    public LinkKeyWritable() {
        super();
        link = new Link();
    }

    public LinkKeyWritable(Link link) {
        super();
        this.link = link;
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        link.batchDay = in.readLong();
        link.source = in.readUTF();
        link.domain = in.readUTF();
        link.path = in.readUTF();
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeLong(link.batchDay);
        out.writeUTF(link.source);
        out.writeUTF(link.domain);
        out.writeUTF(link.path);
    }

    @Override
    public int compareTo(LinkKeyWritable o) {
        return ComparisonChain.start().
                compare(link.batchDay, o.link.batchDay).
                compare(link.domain, o.link.domain).
                compare(link.path, o.link.path).
                result();
    }

    @Override
    public int hashCode() {
        return Objects.hashCode(link.batchDay, link.source, link.domain, link.path);
    }

    @Override
    public boolean equals(final Object obj){
        if(obj instanceof LinkKeyWritable) {
            final LinkKeyWritable o = (LinkKeyWritable)obj;
            return Objects.equal(link.batchDay, o.link.batchDay)
                    && Objects.equal(link.source, o.link.source)
                    && Objects.equal(link.domain, o.link.domain)
                    && Objects.equal(link.path, o.link.path);
        }
        return false;
    }
}

LinkValueWritable:

public class LinkValueWritable implements Writable{
    protected Link link;

    public LinkValueWritable() {
        link = new Link();
    }

    public LinkValueWritable(Link link) {
        this.link = new Link();
        this.link.type = link.type;
        this.link.description = link.description;
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        link.type = in.readUTF();
        link.description = in.readUTF();
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(link.type);
        out.writeUTF(link.description);
    }

    @Override
    public int hashCode() {
        return Objects.hashCode(link.type, link.description);
    }

    @Override
    public boolean equals(final Object obj){
        if(obj instanceof LinkKeyWritable) {
            final LinkKeyWritable o = (LinkKeyWritable)obj;
            return Objects.equal(link.type, o.link.type)
                    && Objects.equal(link.description, o.link.description);
        }
        return false;
    }
}

解决方案

I think the answer is in the implementation of the TextOutputFormat. Specifically, the LineRecordWriter's writeObject method:

/**
 * Write the object to the byte stream, handling Text as a special
 * case.
 * @param o the object to print
 * @throws IOException if the write throws, we pass it on
 */
private void writeObject(Object o) throws IOException {
  if (o instanceof Text) {
    Text to = (Text) o;
    out.write(to.getBytes(), 0, to.getLength());
  } else {
    out.write(o.toString().getBytes(utf8));
  }
}

As you can see, if your key or value is not a Text object, it calls the toString method on it and writes that out. Since you've left toString unimplemented in your key and value, it's using the Object class's implementation, which is writing out the reference.

I'd say that you should try writing an appropriate toString function or using a different OutputFormat.

这篇关于Custom WritableCompare将对象引用显示为输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆