为什么可写的数据类型是可变的? [英] Why should a Writable datatype be Mutable?
问题描述
为什么应该写一个可写的数据类型是可变的?
使用Text(vs String)作为Map,Combine,Shuffle或Reduce流程中的键/值的数据类型的优点是什么?
谢谢&问候,
Raja
您无法选择,这些数据类型必须 原因是序列化机制。让我们看看代码: 所以我们重复使用了相同的键/值对实例再次。 为什么?当时我不知道设计决定,但我认为这是减少垃圾对象的数量。请注意,Hadoop相当古老,当时垃圾收集器的效率不如现在,但即使在今天,如果您要映射数十亿个对象并直接将其作为垃圾丢弃,它在运行时会产生很大的差异。 为什么不能使 如果你想使它不可变,它肯定不适用于序列化过程,因为您需要定义 iterable中的每个值都引用 Why should a Writable datatype be Mutable?
What are the advantages of using Text (vs String) as a datatype for Key/Value in Map, Combine, Shuffle or Reduce processes? Thanks & Regards,
Raja You can't choose, these datatypes must be mutable. The reason is the serialization mechanism. Let's look at the code: So we are reusing the same instance of key/value pairs all over again. Why? I don't know about the design decisions back then, but I assume it was to reduce the amount of garbage objects. Note that Hadoop is quite old and back then the garbage collectors were not as efficient as they are today, however even today it makes a big difference in runtime if you would map billions of objects and directly throw them away as garbage. The real reason why you can't make the If you would make it immutable it would certainly not work with the serialization process anymore, because you would need to define However, you should ask yourself what kind of benefit an immutable key/value should have in Map/Reduce. In Joshua Bloch's- Effective Java, Item 15 he states that immutable classes are easier to design, implement and use. And he is right, because Hadoop's reducer is the worst possible example for mutability: Every value in the iterable refers to the In the end it boils down to the trade-off of performance (cpu and memory- imagine many billions of value objects for a single key must reside in RAM) vs. simplicity. 这篇关于为什么可写的数据类型是可变的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
// version 1.x MapRunner#run()
K1 key = input。 createKey();
V1 value = input.createValue();
(input.next(key,value)){
//映射对输出
mapper.map(key,value,output,reporter);
...
Writable
类型真正不可变的真正原因是您无法声明字段作为最终
。让我们用 IntWritable
作一个简单的例子:
public class IntWritable implements WritableComparable {
private int value;
public IntWritable(){}
public IntWritable(int value){set(value); }
...
value
final。这是行不通的,因为键和值是在运行时通过反射实例化的。这需要一个默认构造函数,因此 InputFormat
无法猜测填充最终数据字段所需的参数。因此,重用实例的整个概念显然与不可变性的概念相矛盾。然而,你应该问自己一个不可变的键/值在Map / Reduce中应该有什么样的好处。在Joshua Bloch的Effective Java第15项中,他指出不可变类更容易设计,实现和使用。他是对的,因为Hadoop的reducer是可变性最糟糕的例子:
void reduce(IntWritable key,Iterable< Text>值,Context上下文)...
相同
共享对象。因此,如果许多人将自己的价值观念变为正常收藏,并问自己为什么总是保持相同的价值观,那么他们会感到困惑。最后归结为性能的折衷(CPU和内存 - 想象单个密钥的数十亿个价值对象必须驻留在RAM中)与简单。 // version 1.x MapRunner#run()
K1 key = input.createKey();
V1 value = input.createValue();
while (input.next(key, value)) {
// map pair to output
mapper.map(key, value, output, reporter);
...
Writable
type really immutable is that you can't declare fields as final
. Let's make a simple example with the IntWritable
:public class IntWritable implements WritableComparable {
private int value;
public IntWritable() {}
public IntWritable(int value) { set(value); }
...
value
final. This can't work, because the keys and values are instantiated at runtime via reflection. This requires a default constructor and thus the InputFormat
can't guess the parameter that would be necessary to fill the final data fields. So the whole concept of reusing instances obviously contradicts the concept of immutability. void reduce(IntWritable key, Iterable<Text> values, Context context) ...
same
shared object. Thus many people are confused if they buffer their values into a normal collection and ask themselves why it always retains the same values.