在Reducer中获取输入文件 [英] Get input file in Reducer
问题描述
我试图写一个mapreduce工作,我需要重复两次值。
I am trying to write a mapreduce job where I need to iterate the values twice.
所以当一个数字 csv
文件,我们需要为每一列应用这个。
So when a numerical csv
file is given we need to apply this for each column.
为此,我们需要找到 min
和 max
值,并将其应用于方程
(v1)。
For that we need to find the min
and max
values and apply it in the equation
(v1).
到目前为止我做的是
In map()
I emit the column id as key and each column as values
In Reduce()
I calculated the min and max values of each column.
之后,我被卡住了。接下来我的目标是应用
方程
After that I am stuck.
Next my aim is to apply
the equation
(v = (v-minA)/(maxA-minA)] *(new maxA - new minA)+ new minA)
$ c> new maxA和new minA分别是0.1,0.0 ,我也有每列max和min。
为了应用eqn v1我需要得到v,即输入文件。
My new maxA and new minA is 0.1,0.0
respectively and I also have each columns max and min.
Inorder to apply the eqn v1 I need to get v,ie the input file.
如何获得?
我认为是 -
从输入csv文件获取第一行(iris数据集)
From input csv file take the first row (iris dataset)
[5.3,3.6,1.6,0.3]
对每个属性应用eqn,并发出整行(Min和Max值在Reducer本身中是已知的)。但是在reducer中,我只会得到列的值。否则我应该读取我的inputfile作为参数在reducer()的setup()。
apply eqn for each attribute and emit the entire row(Min and Max value is known in Reducer itself). But in reducer I will only get the column values.Or else I should read my inputfile as an argument in setup() of reducer().
这是一个最佳实践。
任何建议。
Is that a best practise. Any suggessions.
UPDATE
As Mark Vickery
建议我执行以下操作。
As Mark Vickery
suggested I did the following.
public void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException,
InterruptedException {
System.out.println("in reducer");
double min = Integer.MAX_VALUE,max = 0;
Iterator<DoubleWritable> iterator = values.iterator();
ListIterator<DoubleWritable> lit = IteratorUtils.toListIterator(iterator);
System.out.println("Using ListIterator 1st pass");
while(lit.hasNext()){
System.out.println(lit.next());
DoubleWritable value = lit.next();
if (value.get()< min) {
min = value.get();
}
if (value.get() > max) {
max = value.get();
}
}
System.out.println(min);
System.out.println(max);
// move the list iterator back to start
while(lit.hasPrevious()){
lit.previous();
}
System.out.println("Using ListIterator 2nd pass");
double x = 0;
while(lit.hasNext()){
System.out.println(lit.next());
}
在第一次通过时,
推荐答案
我找到了答案。
如果我们试图在Reducer中重复两次,如下所示:
I found the answer. If we are trying to iterate twice in Reducer as below
ListIterator<DoubleWritable> lit = IteratorUtils.toListIterator(it);
System.out.println("Using ListIterator 1st pass");
while(lit.hasNext())
System.out.println(lit.next());
// move the list iterator back to start
while(lit.hasPrevious())
lit.previous();
System.out.println("Using ListIterator 2nd pass");
while(lit.hasNext())
System.out.println(lit.next());
我们将只输出
Using ListIterator 1st pass
5.3
4.9
5.3
4.6
4.6
Using ListIterator 2nd pass
5.3
5.3
5.3
5.3
5.3
为了得到它正确的方式,我们应该这样循环:
Inorder to get it in the right way we should loop like this:
ArrayList<DoubleWritable> cache = new ArrayList<DoubleWritable>();
for (DoubleWritable aNum : values) {
System.out.println("first iteration: " + aNum);
DoubleWritable writable = new DoubleWritable();
writable.set(aNum.get());
cache.add(writable);
}
int size = cache.size();
for (int i = 0; i < size; ++i) {
System.out.println("second iteration: " + cache.get(i));
}
输出
first iteration: 5.3
first iteration: 4.9
first iteration: 5.3
first iteration: 4.6
first iteration: 4.6
second iteration: 5.3
second iteration: 4.9
second iteration: 5.3
second iteration: 4.6
second iteration: 4.6
这篇关于在Reducer中获取输入文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!