hadoop 中的全局变量 [英] Global variables in hadoop
问题描述
我的程序遵循迭代 map/reduce 方法.如果满足某些条件,它需要停止.无论如何我可以设置一个可以分布在所有 map/reduce 任务中的全局变量,并检查全局变量是否达到完成条件.
My program follows a iterative map/reduce approach. And it needs to stop if certain conditions are met. Is there anyway i can set a global variable that can be distributed across all map/reduce tasks and check if the global variable reaches the condition for completion.
类似的东西.
While(Condition != true){
Configuration conf = getConf();
Job job = new Job(conf, "Dijkstra Graph Search");
job.setJarByClass(GraphSearch.class);
job.setMapperClass(DijkstraMap.class);
job.setReducerClass(DijkstraReduce.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
}
其中 condition 是一个全局变量,在每次 map/reduce 执行期间/之后进行修改.
Where condition is a global variable that is modified during/after each map/reduce execution.
推荐答案
每次运行 map-reduce 作业时,您可以检查输出的状态、计数器中包含的值等,并在控制是否需要再进行一次迭代的迭代的节点.我想我不明白在你的场景中对全局状态的需求来自哪里.
Each time you run a map-reduce job, you can examine the state of the output, the values contained in the counters, etc, and make a decision at the node that is controlling the iteration on whether you want one more iteration or not. I guess I don't understand where the need for a global state comes from in your scenario.
更一般地说,在执行节点之间共享状态有两种主要方式(尽管应该注意,最好避免共享状态,因为它限制了可扩展性).
More generally -- there are two main ways state is shared between executing nodes (although it should be noted that sharing state is best avoided since it limits scalability).
- 将文件写入其他节点可以读取的 HDFS(确保在作业退出时清理文件,并且推测执行不会导致奇怪的失败).
- 使用 ZooKeeper 将一些数据存储在专用的 ZK 树节点中.
这篇关于hadoop 中的全局变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!