使用ArrayWritable进行序列化似乎有趣 [英] Serialization using ArrayWritable seems to work in a funny way
问题描述
我正在使用 ArrayWritable
,在某些时候我需要检查Hadoop如何序列化 ArrayWritable
,这是我通过设置 job.setNumReduceTasks(0)
获得了什么:
I was working with ArrayWritable
, at some point I needed to check how Hadoop serializes the ArrayWritable
, this is what I got by setting job.setNumReduceTasks(0)
:
0 IntArrayWritable@10f11b8
3 IntArrayWritable@544ec1
6 IntArrayWritable@fe748f
8 IntArrayWritable@1968e23
11 IntArrayWritable@14da8f4
14 IntArrayWritable@18f6235
,这是我使用的测试映射器:
and this is the test mapper that I was using:
public static class MyMapper extends Mapper<LongWritable, Text, LongWritable, IntArrayWritable> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
int red = Integer.parseInt(value.toString());
IntWritable[] a = new IntWritable[100];
for (int i =0;i<a.length;i++){
a[i] = new IntWritable(red+i);
}
IntArrayWritable aw = new IntArrayWritable();
aw.set(a);
context.write(key, aw);
}
}
IntArrayWritable
取自javadoc中的示例: ArrayWritable 。
IntArrayWritable
is taken from the example given in the javadoc: ArrayWritable.
import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.IntWritable;
public class IntArrayWritable extends ArrayWritable {
public IntArrayWritable() {
super(IntWritable.class);
}
}
我实际检查了Hadoop的源代码,对我没有意义。
ArrayWritable
不应该序列化类名,并且没有办法可以使用100 IntWritable
的数组使用6/7十六进制值进行序列化。应用程序似乎工作得很好,reducer反序列化正确的值...
发生了什么?我缺少什么?
I actually checked on the source code of Hadoop and this makes no sense to me.
ArrayWritable
should not serialize the class name and there is no way that an array of 100 IntWritable
can be serialized using 6/7 hexadecimal values. The application actually seems to work just fine and the reducer deserializes the right values...
What is happening? What am I missing?
推荐答案
问题是,您从MapReduce作业获得的输出不是该序列化版本数据。它被翻译成一个漂亮的打印字符串。
The problem is that the output you are getting from your MapReduce job is not the serialized version of that data. It is something that is translated into a pretty printed string.
当您将减速器的数量设置为零时,您的映射器现在将通过输出格式传递,您的数据,可能将其转换为可读的字符串。 它不会将其转储出来,就好像它将被reducer拾取一样。
When you set the number of reducers to zero, your mappers now get passed through a output format, which will format your data, likely converting it to a readable string. It does not dump it out serialized as if it was going to be picked up by a reducer.
这篇关于使用ArrayWritable进行序列化似乎有趣的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!