如何序列化Hadoop中的Java对象? [英] How to serialize an Java Object in Hadoop?

查看:115
本文介绍了如何序列化Hadoop中的Java对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对象应该实现 Writable 接口,以便在Hadoop中传输时进行序列化。以Lucene ScoreDoc 类为例:

  public class ScoreDoc implements java.io.Serializable {

/ **查询的这个文档的分数。 * /
公众流动分数;

/ **专家:文档编号。
* @see Searcher#doc(int)* /
public int doc;

**仅由{@link TopDocs#merge}设置* /
public int shardIndex;

/ **构造一个ScoreDoc。 * /
public Sc​​oreDoc(int doc,float score){
this(doc,score,-1);
}

/ **构造一个ScoreDoc。 * /
public Sc​​oreDoc(int doc,float score,int shardIndex){
this.doc = doc;
this.score =分数;
this.shardIndex = shardIndex;
}

//一种便捷的调试方法。
@Override
public String toString(){
returndoc =+ doc +score =+ score +shardIndex =+ shardIndex;




$ b

我应该如何用可写入界面? Writable java.io.serializable 接口之间的连接是什么?

解决方案

我认为篡改内置的Lucene类不是一个好主意。相反,让你自己的类可以包含ScoreDoc类型的字段,并且可以在接口中实现可写的Hadoop。它会是这样的:

  public class MyScoreDoc implements Writable {

private ScoreDoc sd;
$ b $ public void write(DataOutput out)throws IOException {
String [] splits = sd.toString()。split();

//从字符串
获得分数值Float score = Float.parseFloat((splits [0] .split(=))[1]);

//对doc和shardIndex字段做同样的处理
// ....

out.writeInt(score);
out.writeInt(doc);
out.writeInt(shardIndex);
}

public void readFields(DataInput in)throws IOException {
float score = in.readInt();
int doc = in.readInt();
int shardIndex = in.readInt();

sd = new ScoreDoc(score,doc,shardIndex);
}

//字符串toString()
}


Object should implement Writable interface in order to be serialized when transmitted in Hadoop. Take the Lucene ScoreDoc class as an example:

public class ScoreDoc implements java.io.Serializable {

  /** The score of this document for the query. */
  public float score;

  /** Expert: A hit document's number.
   * @see Searcher#doc(int) */
  public int doc;

  /** Only set by {@link TopDocs#merge} */
  public int shardIndex;

  /** Constructs a ScoreDoc. */
  public ScoreDoc(int doc, float score) {
    this(doc, score, -1);
  }

  /** Constructs a ScoreDoc. */
  public ScoreDoc(int doc, float score, int shardIndex) {
    this.doc = doc;
    this.score = score;
    this.shardIndex = shardIndex;
  }

  // A convenience method for debugging.
  @Override
  public String toString() {
    return "doc=" + doc + " score=" + score + " shardIndex=" + shardIndex;
  }
}

How should I serialize it with Writable interface? What is the connection between Writable and java.io.serializable interface?

解决方案

I think that it wont be a good idea to tamper with the in-built Lucene class. Instead, have your own class which can will contain the fields of ScoreDoc type and would implement Hadoop writable in interface. It would be something like this:

public class MyScoreDoc implements Writable  {      

  private ScoreDoc sd;

  public void write(DataOutput out) throws IOException {
      String [] splits = sd.toString().split(" ");

      // get the score value from the string
      Float score = Float.parseFloat((splits[0].split("="))[1]);

      // do the same for doc and shardIndex fields
      // ....    

      out.writeInt(score);
      out.writeInt(doc);
      out.writeInt(shardIndex);
  }

  public void readFields(DataInput in) throws IOException {
      float score = in.readInt();
      int doc = in.readInt();
      int shardIndex = in.readInt();

      sd = new ScoreDoc (score, doc, shardIndex);
  }

  //String toString()
}

这篇关于如何序列化Hadoop中的Java对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆