在 Java 类型的 Hadoop MapReduce 中具有可写包装类的原因是什么? [英] What is the reason for having Writable wrapper classes in Hadoop MapReduce for Java types?
问题描述
在我看来,可以编写 org.apache.hadoop.io.serializer.Serialization
来直接序列化 java 类型,其格式与包装类将类型序列化成的格式相同.这样,Mappers 和 Reducers 就不必处理包装类.
It seems to me that a org.apache.hadoop.io.serializer.Serialization
could be written to serialize the java types directly in the same format that the wrapper classes serialize the type into. That way the Mappers and Reducers don't have to deal with the wrapper classes.
推荐答案
没有什么可以阻止您更改序列化以使用不同的机制,例如 java Serializable 接口或诸如节俭、协议缓冲区等之类的东西.
There is nothing stopping you changing the serialization to use a different mechanism such as java Serializable interface or something like thrift, protocol buffers etc.
事实上,Hadoop 为 Java Serializable 对象 - 只需配置序列化工厂即可使用它.默认的序列化机制是WritableSerialization
,但是可以通过设置以下配置属性来改变:
In fact, Hadoop comes with an (experimental) Serialization implementation for Java Serializable objects - just configure the serialization factory to use it. The default serialization mechanism is WritableSerialization
, but this can be changed by setting the following configuration property:
io.serializations=org.apache.hadoop.io.serializer.JavaSerialization
但请记住,任何需要可写(输入/输出格式、分区器、比较器)等的东西都需要替换为可以传递 Serializable
实例而不是 可写
实例.
Bear in mind however that anything that expects a Writable (Input/Output formats, partitioners, comparators) etc will need to be replaced by versions that can be passed a Serializable
instance rather than a Writable
instance.
为好奇的读者提供更多链接:
Some more links for the curious reader:
- http://www.tom-e-white.com/2008/07/rpc-and-serialization-with-hadoop.html
- 什么是Hadoop Writable 和 java.io.serialization 之间的联系和区别? - 这似乎与您所问的问题相似,并且 Tariq 有一个很好的链接到一个线程,Doug Cutting 在该线程中解释了使用背后的基本原理Writables over Serializables
- http://www.tom-e-white.com/2008/07/rpc-and-serialization-with-hadoop.html
- What are the connections and differences between Hadoop Writable and java.io.serialization? - Which seems to be a similar question to what you're asking, and Tariq has a good link to a thread in which Doug Cutting explains the rationale behind using Writables over Serializables
这篇关于在 Java 类型的 Hadoop MapReduce 中具有可写包装类的原因是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!