hadoop 中的全局变量 [英] Global variables in hadoop

查看:27
本文介绍了hadoop 中的全局变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的程序遵循迭代 map/reduce 方法.如果满足某些条件,它需要停止.无论如何我可以设置一个可以分布在所有 map/reduce 任务中的全局变量,并检查全局变量是否达到完成条件.

My program follows a iterative map/reduce approach. And it needs to stop if certain conditions are met. Is there anyway i can set a global variable that can be distributed across all map/reduce tasks and check if the global variable reaches the condition for completion.

类似的东西.

While(Condition != true){

            Configuration conf = getConf();
            Job job = new Job(conf, "Dijkstra Graph Search");

            job.setJarByClass(GraphSearch.class);
            job.setMapperClass(DijkstraMap.class);
            job.setReducerClass(DijkstraReduce.class);

            job.setOutputKeyClass(IntWritable.class);
            job.setOutputValueClass(Text.class);

}

其中 condition 是一个全局变量,在每次 map/reduce 执行期间/之后进行修改.

Where condition is a global variable that is modified during/after each map/reduce execution.

推荐答案

每次运行 map-reduce 作业时,您可以检查输出的状态、计数器中包含的值等,并在控制是否需要再进行一次迭代的迭代的节点.我想我不明白在你的场景中对全局状态的需求来自哪里.

Each time you run a map-reduce job, you can examine the state of the output, the values contained in the counters, etc, and make a decision at the node that is controlling the iteration on whether you want one more iteration or not. I guess I don't understand where the need for a global state comes from in your scenario.

更一般地说,在执行节点之间共享状态有两种主要方式(尽管应该注意,最好避免共享状态,因为它限制了可扩展性).

More generally -- there are two main ways state is shared between executing nodes (although it should be noted that sharing state is best avoided since it limits scalability).

  1. 将文件写入其他节点可以读取的 HDFS(确保在作业退出时清理文件,并且推测执行不会导致奇怪的失败).
  2. 使用 ZooKeeper 将一些数据存储在专用的 ZK 树节点中.

这篇关于hadoop 中的全局变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆