为什么在Mapper类中未使用LongWritable(密钥)? [英] Why LongWritable (key) has not been used in Mapper class?
问题描述
映射器:
Mapper类是泛型类型,具有四个形式类型参数,用于指定映射函数的输入键,输入值,输出键和输出值类型
The Mapper class is a generic type, with four formal type parameters that specify the input key, input value, output key, and output value types of the map function
public class MaxTemperatureMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999;
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String year = line.substring(15, 19);
int airTemperature;
if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
airTemperature = Integer.parseInt(line.substring(88, 92));
} else {
airTemperature = Integer.parseInt(line.substring(87, 92));
}
String quality = line.substring(92, 93);
if (airTemperature != MISSING && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemperature));
}
}
减速器:
四个正式类型参数用于指定输入和输出类型,这 减少功能的时间. reduce函数的输入类型必须与map函数的输出类型相匹配:Text和IntWritable
Four formal type parameters are used to specify the input and output types, this time for the reduce function. The input types of the reduce function must match the output types of the map function: Text and IntWritable
public class MaxTemperatureReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
但是在此示例中,从未使用过密钥.
But in this example, key was never used.
映射器中的密钥有什么用,而根本没有使用过?
What is the use of key in Mapper, which has not been used at all?
为什么密钥是可写的?
推荐答案
The input format in this example used is TextInputFormat which produces the key/value pair as LongWritable/Text
.
此处的键LongWritable
表示从给定输入文件的Input Split
中读取的当前行的偏移位置. Text
代表实际的当前行本身.
Here the key LongWritable
represents the offset location of the current line being read from the Input Split
of the given input file. Where the Text
represents the actual current line itself.
我们不能说LongWritable
键为文件中的每一行指定的行偏移值都没有用.这取决于用例,根据您的情况,此输入键并不重要.
We cannot say this line offset value given by the LongWritable
key for every line in the file is not useful. It depends upon the usecases, as per your case this input key is not significant.
我们有除TextInputFormat
以外的众多InputFormat
类型,它们以不同的方式解析输入文件中的行并产生其相关的键/值对.
Where as we have numerous types of InputFormat
types other than TextInputFormat
which parses the lines from the input file in different ways and produces its relevant key/value pairs.
例如 KeyValueTextInputFormat 是TextInputFormat
的子类,它使用configures delimiter
解析每一行,并将键/值生成为Text/Text
.
For example the KeyValueTextInputFormat is a subclass of TextInputFormat
, it parses every line using configures delimiter
and produces the key/value as Text/Text
.
- 在一些输入格式和键/值类型的列表下方找到
- Find below the list of few Input formats and key/value types,
KeyValueTextInputFormat Text/Text
NLineInputFormat LongWritable/Text
FixedLengthInputFormat LongWritable/BytesWritable
除了我们有几种输入格式外,它们在声明时采用基于泛型的自定义键/值类型.如SequenceFileInputFormat, CombineFileInputFormat
.请看一下Hadoop权威指南中的输入格式"一章.
Other than we have few Input formats which take the Generics-based custom key/value types upon declaration. Such like SequenceFileInputFormat, CombineFileInputFormat
. Kindly give a look to the Input Format chapter in Hadoop definitive guide.
希望这会有所帮助.
这篇关于为什么在Mapper类中未使用LongWritable(密钥)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!