Hadoop:将对象作为输出值的简单方法,无需可写接口 [英] Hadoop: Easy way to have object as output value without Writable interface

查看:97
本文介绍了Hadoop:将对象作为输出值的简单方法,无需可写接口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图利用hadoop来训练多个模型。我的数据足够小以适应内存,所以我想在每个地图任务中都有一个模型被训练。



我的问题是,当我完成训练我的模型时,我需要将它发送到减速器。我正在使用Weka来训练模型。我不想开始寻找如何在Weka类中实现Writable接口,因为它需要很多努力。我正在寻找一个简单的方法来做到这一点。

Weka中的Classifier类实现了Serializable接口。如何将这个对象发送给reducer?

 编辑

以下是提到weka对象序列化的链接: http:/ /weka.wikispaces.com/Serialization



以下是我的代码的样子:
配置作业(只有一部分配置是发布):

  conf.set(io.serializations,org.apache.hadoop.io.serializer.JavaSerialization, +org.apache.hadoop.io.serializer.WritableSerialization); 
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Classifier.class);

地图功能:

  //在数据变量
中加载数据集Classifier tree = new J48();
tree.buildClassifier();
context.write(new Text(whatever),tree);

我的Map类扩展了Mapper(Object,Text,Text,Classifier)

但是我收到这个错误:

  java.lang.NullPointerException $ b $ org .apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)LT 
。在org.apache.hadoop.mapred.MapTask $ MapOutputBuffer&;初始化>(MapTask.java:964)
。在org.apache.hadoop.mapred.MapTask $ NewOutputCollector<初始化>(MapTask.java:673)
。在org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755)
。在org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
在org.apache.hadoop.mapred.Child $ 4.run(Child.java:259)
。在java.security.AccessController.doPrivileged(本机方法)
在javax.security.auth.Subject.doAs(Subject.java:416)在org.apache.hadoop.security.UserGroupInformation.doAs
(UserGroupInformation .java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)

什么我做错了?

解决方案

您可以定义您自己的序列化机制



我认为它可以解决实现Serialization接口的问题,并在 io.serializations 配置属性中定义您的实现



在你的情况中,如果你只是想使用java序列化,请设置这个属性为:


  • org.apache.hadoop.io.serializer.JavaSerialization


I am trying to exploit hadoop to train multiple models . My data are small enough to fit in memory so i want to have one model trained in every map task.

My problem is that when i have finished training my model, i need to send it to the reducer. I am using Weka to train the model. I don't want to start looking how to implement the Writable interface in Weka classes, because it needs a lot of effort. I am looking for a simple way to do this.

The Classifier class in Weka implements the Serializable interface. How can i send this object to the reducer?

        edits

Here is the link that mentions weka objects serialization: http://weka.wikispaces.com/Serialization

Here is what my code looks like: Configuring the job(only a part of the configuration is posted):

       conf.set("io.serializations","org.apache.hadoop.io.serializer.JavaSerialization," + "org.apache.hadoop.io.serializer.WritableSerialization"); 
       job.setOutputKeyClass(Text.class);
       job.setOutputValueClass(Classifier.class);

Map function:

     //load dataset in data variable
     Classifier tree=new J48();
     tree.buildClassifier();
     context.write(new Text("whatever"), tree);

My Map class extends Mapper (Object,Text,Text,Classifier)

But i am getting this error:

     java.lang.NullPointerException
at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:964)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:673)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)

What i am doing wrong??

解决方案

You can define your own serialization mechanism

I think it resolves around implementing the Serialization interface, and defining your implementation in the io.serializations configuration property

In your case, if you just want to use java serialization, set this property to:

  • org.apache.hadoop.io.serializer.JavaSerialization

这篇关于Hadoop:将对象作为输出值的简单方法,无需可写接口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆