在Hadoop中使用NullWritable的优点 [英] Advantages of using NullWritable in Hadoop
问题描述
使用 NullWritable
用于 null
键/值使用 null的优点是什么
文本(即 new Text(null)
)。我从Hadoop:权威指南一书中看到以下内容。
NullWritable
是Writable
的特殊类型,因为它具有零长度序列化。没有字节
被写入或读取流。它用作占位符;例如,在
MapReduce中,如果您不需要
来使用该位置,则可以将键或值声明为NullWritable
有效地存储一个恒定的空值。当你想存储一个值列表时,NullWritable也可以作为SequenceFile
中的一个键来使用,而
是键值对。它是一个不可变的单例:通过调用
NullWritable.get()
我不清楚如何使用 NullWritable
写出输出。在初始输出文件中是否会有单个常量值指示该文件的键或值是 null
,以便MapReduce框架可以忽略读取 null
键/值(以 null
为准)?另外, null
文本是如何实际序列化的?
谢谢,
键/值类型必须在运行时给出,所以任何写入或读取
NullWritables
将提前知道它将处理该类型;文件中没有标记或任何内容。从技术上讲, NullWritables
是read,它只是读取一个 NullWritable
实际上是一个空操作。你可以看到自己没有任何书面或阅读: NullWritable nw = NullWritable.get();
ByteArrayOutputStream out = new ByteArrayOutputStream();
nw.write(new DataOutputStream(out));
System.out.println(Arrays.toString(out.toByteArray())); //打印[]
ByteArrayInputStream in = new ByteArrayInputStream(new byte [0]);
nw.readFields(new DataInputStream(in)); //正常工作
至于你关于的问题new Text(null ),你可以尝试一下:
Text text = new Text((String)空值);
ByteArrayOutputStream out = new ByteArrayOutputStream();
text.write(new DataOutputStream(out)); //抛出NullPointerException异常
System.out.println(Arrays.toString(out.toByteArray()));
文字
根本无法使用 null
字符串
。
What are the advantages of using NullWritable
for null
keys/values over using null
texts (i.e. new Text(null)
). I see the following from the «Hadoop: The Definitive Guide» book.
NullWritable
is a special type ofWritable
, as it has a zero-length serialization. No bytes are written to, or read from, the stream. It is used as a placeholder; for example, in MapReduce, a key or a value can be declared as aNullWritable
when you don’t need to use that position—it effectively stores a constant empty value. NullWritable can also be useful as a key inSequenceFile
when you want to store a list of values, as opposed to key-value pairs. It is an immutable singleton: the instance can be retrieved by callingNullWritable.get()
I do not clearly understand how the output is written out using NullWritable
? Will there be a single constant value in the beginning output file indicating that the keys or values of this file are null
, so that the MapReduce framework can ignore reading the null
keys/values (whichever is null
)? Also, how actually are null
texts serialized?
Thanks,
Venkat
The key/value types must be given at runtime, so anything writing or reading NullWritables
will know ahead of time that it will be dealing with that type; there is no marker or anything in the file. And technically the NullWritables
are "read", it's just that "reading" a NullWritable
is actually a no-op. You can see for yourself that there's nothing at all written or read:
NullWritable nw = NullWritable.get();
ByteArrayOutputStream out = new ByteArrayOutputStream();
nw.write(new DataOutputStream(out));
System.out.println(Arrays.toString(out.toByteArray())); // prints "[]"
ByteArrayInputStream in = new ByteArrayInputStream(new byte[0]);
nw.readFields(new DataInputStream(in)); // works just fine
And as for your question about new Text(null)
, again, you can try it out:
Text text = new Text((String)null);
ByteArrayOutputStream out = new ByteArrayOutputStream();
text.write(new DataOutputStream(out)); // throws NullPointerException
System.out.println(Arrays.toString(out.toByteArray()));
Text
will not work at all with a null
String
.
这篇关于在Hadoop中使用NullWritable的优点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!