Hadoop:将对象作为输出值的简单方法,无需可写接口 [英] Hadoop: Easy way to have object as output value without Writable interface
问题描述
我的问题是,当我完成训练我的模型时,我需要将它发送到减速器。我正在使用Weka来训练模型。我不想开始寻找如何在Weka类中实现Writable接口,因为它需要很多努力。我正在寻找一个简单的方法来做到这一点。
编辑
以下是提到weka对象序列化的链接: http:/ /weka.wikispaces.com/Serialization
以下是我的代码的样子:
配置作业(只有一部分配置是发布):
conf.set(io.serializations,org.apache.hadoop.io.serializer.JavaSerialization, +org.apache.hadoop.io.serializer.WritableSerialization);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Classifier.class);
地图功能:
//在数据变量
中加载数据集Classifier tree = new J48();
tree.buildClassifier();
context.write(new Text(whatever),tree);
我的Map类扩展了Mapper(Object,Text,Text,Classifier)
但是我收到这个错误:
java.lang.NullPointerException $ b $ org .apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)LT
。在org.apache.hadoop.mapred.MapTask $ MapOutputBuffer&;初始化>(MapTask.java:964)
。在org.apache.hadoop.mapred.MapTask $ NewOutputCollector<初始化>(MapTask.java:673)
。在org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755)
。在org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
在org.apache.hadoop.mapred.Child $ 4.run(Child.java:259)
。在java.security.AccessController.doPrivileged(本机方法)
在javax.security.auth.Subject.doAs(Subject.java:416)在org.apache.hadoop.security.UserGroupInformation.doAs
(UserGroupInformation .java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
什么我做错了?
您可以定义您自己的序列化机制
我认为它可以解决实现Serialization接口的问题,并在 io.serializations
配置属性中定义您的实现
在你的情况中,如果你只是想使用java序列化,请设置这个属性为:
org.apache.hadoop.io.serializer.JavaSerialization
I am trying to exploit hadoop to train multiple models . My data are small enough to fit in memory so i want to have one model trained in every map task.
My problem is that when i have finished training my model, i need to send it to the reducer. I am using Weka to train the model. I don't want to start looking how to implement the Writable interface in Weka classes, because it needs a lot of effort. I am looking for a simple way to do this.
The Classifier class in Weka implements the Serializable interface. How can i send this object to the reducer?
edits
Here is the link that mentions weka objects serialization: http://weka.wikispaces.com/Serialization
Here is what my code looks like: Configuring the job(only a part of the configuration is posted):
conf.set("io.serializations","org.apache.hadoop.io.serializer.JavaSerialization," + "org.apache.hadoop.io.serializer.WritableSerialization");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Classifier.class);
Map function:
//load dataset in data variable
Classifier tree=new J48();
tree.buildClassifier();
context.write(new Text("whatever"), tree);
My Map class extends Mapper (Object,Text,Text,Classifier)
But i am getting this error:
java.lang.NullPointerException
at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:964)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:673)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
What i am doing wrong??
You can define your own serialization mechanism
- http://www.lexemetech.com/2008/07/rpc-and-serialization-with-hadoop.html
- https://issues.apache.org/jira/browse/HADOOP-1986
I think it resolves around implementing the Serialization interface, and defining your implementation in the io.serializations
configuration property
In your case, if you just want to use java serialization, set this property to:
org.apache.hadoop.io.serializer.JavaSerialization
这篇关于Hadoop:将对象作为输出值的简单方法,无需可写接口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!