在Reducer中获取输入文件 [英] Get input file in Reducer

查看:214
本文介绍了在Reducer中获取输入文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图写一个mapreduce工作,我需要重复两次值。

I am trying to write a mapreduce job where I need to iterate the values twice.

所以当一个数字 csv 文件,我们需要为每一列应用这个。

So when a numerical csv file is given we need to apply this for each column.

为此,我们需要找到 min max 值,并将其应用于方程(v1)。

For that we need to find the min and max values and apply it in the equation(v1).

到目前为止我做的是

In map()
I emit the column id as key and each column as values
In Reduce()
I calculated the min and max values of each column.

之后,我被卡住了。接下来我的目标是应用方程

After that I am stuck. Next my aim is to apply the equation

(v = (v-minA)/(maxA-minA)] *(new maxA - new minA)+ new minA)

$ c> new maxA和new minA分别是0.1,0.0 ,我也有每列max和min。
为了应用eqn v1我需要得到v,即输入文件。

My new maxA and new minA is 0.1,0.0 respectively and I also have each columns max and min. Inorder to apply the eqn v1 I need to get v,ie the input file.

如何获得?

我认为是 -

从输入csv文件获取第一行(iris数据集)

From input csv file take the first row (iris dataset)

[5.3,3.6,1.6,0.3]

对每个属性应用eqn,并发出整行(Min和Max值在Reducer本身中是已知的)。但是在reducer中,我只会得到列的值。否则我应该读取我的inputfile作为参数在reducer()的setup()。

apply eqn for each attribute and emit the entire row(Min and Max value is known in Reducer itself). But in reducer I will only get the column values.Or else I should read my inputfile as an argument in setup() of reducer().

这是一个最佳实践。
任何建议。

Is that a best practise. Any suggessions.

UPDATE

As Mark Vickery 建议我执行以下操作。

As Mark Vickery suggested I did the following.

public void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException,
    InterruptedException {
System.out.println("in reducer");
double min = Integer.MAX_VALUE,max = 0;
Iterator<DoubleWritable> iterator = values.iterator();
ListIterator<DoubleWritable> lit = IteratorUtils.toListIterator(iterator);
System.out.println("Using ListIterator 1st pass");
while(lit.hasNext()){
    System.out.println(lit.next());
    DoubleWritable value = lit.next();
    if (value.get()< min) { 
        min = value.get();
    }
    if (value.get() > max) {
        max = value.get();
    }
}
System.out.println(min);
System.out.println(max);

// move the list iterator back to start
while(lit.hasPrevious()){
    lit.previous();
}

System.out.println("Using ListIterator 2nd pass");
double x = 0;
while(lit.hasNext()){
    System.out.println(lit.next());

}

在第一次通过时,

推荐答案

我找到了答案。
如果我们试图在Reducer中重复两次,如下所示:

I found the answer. If we are trying to iterate twice in Reducer as below

    ListIterator<DoubleWritable> lit = IteratorUtils.toListIterator(it);
    System.out.println("Using ListIterator 1st pass");
    while(lit.hasNext())
        System.out.println(lit.next());

    // move the list iterator back to start
    while(lit.hasPrevious())
        lit.previous();

    System.out.println("Using ListIterator 2nd pass");
    while(lit.hasNext())
        System.out.println(lit.next());

我们将只输出

Using ListIterator 1st pass
5.3
4.9
5.3
4.6
4.6
Using ListIterator 2nd pass
5.3
5.3
5.3
5.3
5.3

为了得到它正确的方式,我们应该这样循环:

Inorder to get it in the right way we should loop like this:

ArrayList<DoubleWritable> cache = new ArrayList<DoubleWritable>();
 for (DoubleWritable aNum : values) {
    System.out.println("first iteration: " + aNum);
    DoubleWritable writable = new DoubleWritable();
    writable.set(aNum.get());
    cache.add(writable);
 }
 int size = cache.size();
 for (int i = 0; i < size; ++i) {
     System.out.println("second iteration: " + cache.get(i));
  }

输出

first iteration: 5.3
first iteration: 4.9
first iteration: 5.3
first iteration: 4.6
first iteration: 4.6
second iteration: 5.3
second iteration: 4.9
second iteration: 5.3
second iteration: 4.6
second iteration: 4.6

这篇关于在Reducer中获取输入文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆