如何序列化Hadoop中的Java对象? [英] How to serialize an Java Object in Hadoop?
本文介绍了如何序列化Hadoop中的Java对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
Writable
接口,以便在Hadoop中传输时进行序列化。以Lucene ScoreDoc
类为例: public class ScoreDoc implements java.io.Serializable {
/ **查询的这个文档的分数。 * /
公众流动分数;
/ **专家:文档编号。
* @see Searcher#doc(int)* /
public int doc;
**仅由{@link TopDocs#merge}设置* /
public int shardIndex;
/ **构造一个ScoreDoc。 * /
public ScoreDoc(int doc,float score){
this(doc,score,-1);
}
/ **构造一个ScoreDoc。 * /
public ScoreDoc(int doc,float score,int shardIndex){
this.doc = doc;
this.score =分数;
this.shardIndex = shardIndex;
}
//一种便捷的调试方法。
@Override
public String toString(){
returndoc =+ doc +score =+ score +shardIndex =+ shardIndex;
$ b 我应该如何用可写入
界面? Writable
和 java.io.serializable
接口之间的连接是什么?
解决方案我认为篡改内置的Lucene类不是一个好主意。相反,让你自己的类可以包含ScoreDoc类型的字段,并且可以在接口中实现可写的Hadoop。它会是这样的:
public class MyScoreDoc implements Writable {
private ScoreDoc sd;
$ b $ public void write(DataOutput out)throws IOException {
String [] splits = sd.toString()。split();
//从字符串
获得分数值Float score = Float.parseFloat((splits [0] .split(=))[1]);
//对doc和shardIndex字段做同样的处理
// ....
out.writeInt(score);
out.writeInt(doc);
out.writeInt(shardIndex);
}
public void readFields(DataInput in)throws IOException {
float score = in.readInt();
int doc = in.readInt();
int shardIndex = in.readInt();
sd = new ScoreDoc(score,doc,shardIndex);
}
//字符串toString()
}
Object should implement Writable
interface in order to be serialized when transmitted in Hadoop. Take the Lucene ScoreDoc
class as an example:
public class ScoreDoc implements java.io.Serializable {
/** The score of this document for the query. */
public float score;
/** Expert: A hit document's number.
* @see Searcher#doc(int) */
public int doc;
/** Only set by {@link TopDocs#merge} */
public int shardIndex;
/** Constructs a ScoreDoc. */
public ScoreDoc(int doc, float score) {
this(doc, score, -1);
}
/** Constructs a ScoreDoc. */
public ScoreDoc(int doc, float score, int shardIndex) {
this.doc = doc;
this.score = score;
this.shardIndex = shardIndex;
}
// A convenience method for debugging.
@Override
public String toString() {
return "doc=" + doc + " score=" + score + " shardIndex=" + shardIndex;
}
}
How should I serialize it with Writable
interface? What is the connection between Writable
and java.io.serializable
interface?
解决方案 I think that it wont be a good idea to tamper with the in-built Lucene class. Instead, have your own class which can will contain the fields of ScoreDoc type and would implement Hadoop writable in interface. It would be something like this:
public class MyScoreDoc implements Writable {
private ScoreDoc sd;
public void write(DataOutput out) throws IOException {
String [] splits = sd.toString().split(" ");
// get the score value from the string
Float score = Float.parseFloat((splits[0].split("="))[1]);
// do the same for doc and shardIndex fields
// ....
out.writeInt(score);
out.writeInt(doc);
out.writeInt(shardIndex);
}
public void readFields(DataInput in) throws IOException {
float score = in.readInt();
int doc = in.readInt();
int shardIndex = in.readInt();
sd = new ScoreDoc (score, doc, shardIndex);
}
//String toString()
}
这篇关于如何序列化Hadoop中的Java对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文