在Hadoop MapReduce for Java类型中使用Writable包装类的原因是什么? [英] What is the reason for having Writable wrapper classes in Hadoop MapReduce for Java types?

查看:186
本文介绍了在Hadoop MapReduce for Java类型中使用Writable包装类的原因是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我看来,可以编写一个 org.apache.hadoop.io.serializer.Serialization 来直接以与包装类相同的格式序列化java类型将该类型序列化为。这样的映射器和减速不必处理包装类。


解决方案

有什么能够阻止你改变了系列化使用不同的机制,如Java Serializable接口或类似thrift,协议缓冲区等。

事实上,Hadoop带有一个(实验性)序列化实现,用于的Java序列化对象 - 只是配置序列化工厂使用它。默认的序列化机制是 WritableSerialization ,但可以通过设置以下配置属性来更改:

  io.serializations = org.apache.hadoop.io.serializer.JavaSerialization 

请记住,任何需要Writable(输入/输出格式,分区程序,比较器)等的东西都需要被可以传递 Serializable 实例的版本所取代,而不是一个可写的实例。



好奇读者的更多链接:


It seems to me that a org.apache.hadoop.io.serializer.Serialization could be written to serialize the java types directly in the same format that the wrapper classes serialize the type into. That way the Mappers and Reducers don't have to deal with the wrapper classes.

解决方案

There is nothing stopping you changing the serialization to use a different mechanism such as java Serializable interface or something like thrift, protocol buffers etc.

In fact, Hadoop comes with an (experimental) Serialization implementation for Java Serializable objects - just configure the serialization factory to use it. The default serialization mechanism is WritableSerialization, but this can be changed by setting the following configuration property:

io.serializations=org.apache.hadoop.io.serializer.JavaSerialization

Bear in mind however that anything that expects a Writable (Input/Output formats, partitioners, comparators) etc will need to be replaced by versions that can be passed a Serializable instance rather than a Writable instance.

Some more links for the curious reader:

这篇关于在Hadoop MapReduce for Java类型中使用Writable包装类的原因是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆