在Hadoop MapReduce for Java类型中使用Writable包装类的原因是什么? [英] What is the reason for having Writable wrapper classes in Hadoop MapReduce for Java types?
问题描述
在我看来,可以编写一个 org.apache.hadoop.io.serializer.Serialization
来直接以与包装类相同的格式序列化java类型将该类型序列化为。这样的映射器和减速不必处理包装类。
有什么能够阻止你改变了系列化使用不同的机制,如Java Serializable接口或类似thrift,协议缓冲区等。
事实上,Hadoop带有一个(实验性)序列化实现,用于的Java序列化对象 - 只是配置序列化工厂使用它。默认的序列化机制是 WritableSerialization
,但可以通过设置以下配置属性来更改:
io.serializations = org.apache.hadoop.io.serializer.JavaSerialization
请记住,任何需要Writable(输入/输出格式,分区程序,比较器)等的东西都需要被可以传递 Serializable
实例的版本所取代,而不是一个可写的
实例。
好奇读者的更多链接:
- http://www.tom-e-white.com/2008/07/rpc-and-serialization-with-hadoop.html
- 哪些连接和差异betwe Hadoop Writable和java.io.serialization? - 这似乎与您所问的问题类似,而Tariq与Doug Cutting解释使用通过Serializables写入
It seems to me that a org.apache.hadoop.io.serializer.Serialization
could be written to serialize the java types directly in the same format that the wrapper classes serialize the type into. That way the Mappers and Reducers don't have to deal with the wrapper classes.
There is nothing stopping you changing the serialization to use a different mechanism such as java Serializable interface or something like thrift, protocol buffers etc.
In fact, Hadoop comes with an (experimental) Serialization implementation for Java Serializable objects - just configure the serialization factory to use it. The default serialization mechanism is WritableSerialization
, but this can be changed by setting the following configuration property:
io.serializations=org.apache.hadoop.io.serializer.JavaSerialization
Bear in mind however that anything that expects a Writable (Input/Output formats, partitioners, comparators) etc will need to be replaced by versions that can be passed a Serializable
instance rather than a Writable
instance.
Some more links for the curious reader:
- http://www.tom-e-white.com/2008/07/rpc-and-serialization-with-hadoop.html
- What are the connections and differences between Hadoop Writable and java.io.serialization? - Which seems to be a similar question to what you're asking, and Tariq has a good link to a thread in which Doug Cutting explains the rationale behind using Writables over Serializables
这篇关于在Hadoop MapReduce for Java类型中使用Writable包装类的原因是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!