hadoop - map减少任务和静态变量 [英] hadoop - map reduce task and static variable

查看:98
本文介绍了hadoop - map减少任务和静态变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚开始研究一些hadoop / hbase MapReduce作业(使用cloudera),并且我有以下问题:

比方说,我们有一个java类,主要和一个静态变量。该类定义了与Mapper和Reducer任务相对应的内部类。
在开始工作之前,主要初始化静态变量。该变量在Mapper类中读取。然后使用群集上的'hadoop jar'启动该类。



我的问题:我没有看到其他节点上的Map和Reduce任务如何看到该静态变量。有没有什么hadoop magic允许节点共享一个jvm或静态变量?这怎么能工作?
我必须在一个类上做这件事,而我无法弄清楚在非单节点集群中这是如何的好。
谢谢

解决方案

在分布式Hadoop集群中,每个Map / Reduce任务都运行在它自己独立的JVM中。所以没有办法在不同的JVM上运行不同的类实例(甚至在不同的节点上)之间共享静态变量。

但是如果你想在任务之间共享一些不可变的数据,你可以使用配置 class:

  //驱动程序代码
Configuration config = Configuration.create();
config.setLong(foo.bar.somelong,1337);
...

//映射器代码
公共类SomeMapper ... {
private long someLong = 0;
public void setup(Context context){
Configuration config = context.getConfiguration();
someLong = config.getLong(foo.bar.somelong);
}
}


I just started working on some hadoop/hbase MapReduce job (using cloudera) and I have the following question :

Let's say, we have a java class with a main and a static viariable. That class define inner class corresponding to the Mapper and Reducer tasks. Before lauching the job, the main initialize the static variable. This variable is read in the Mapper class. The class is then launched using 'hadoop jar' on a cluster.

My question: I don't see how Map and Reduce tasks on other nodes can see that static variable. Is there any "hadoop magic" that allow nodes to share a jvm or static variables ? How can this even work ? I have to work on a class doing just that, and I can't figure out how this is ok in a non-mononode cluster. Thank you

解决方案

In a distributed Hadoop cluster each Map/Reduce task runs in it's own separate JVM. So there's no way to share static variable between different class instances running on different JVMs (and even on different nodes).

But if you want to share some immutable data between tasks, you can use Configuration class:

// driver code
Configuration config = Configuration.create();
config.setLong("foo.bar.somelong",1337);
...

// mapper code
public class SomeMapper ... {
    private long someLong = 0;
    public void setup(Context context) {
        Configuration config = context.getConfiguration();
        someLong = config.getLong("foo.bar.somelong");
    }
}

这篇关于hadoop - map减少任务和静态变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆