在hadoop中的全局变量 [英] Global variables in hadoop
问题描述
类似这样的。
While(Condition!= true){
Configuration conf = getConf();
工作职位=新职位(conf,Dijkstra图表搜索);
job.setJarByClass(GraphSearch.class);
job.setMapperClass(DijkstraMap.class);
job.setReducerClass(DijkstraReduce.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
$ b其中,condition是一个全局变量,每个映射/减少执行。解决方案每次运行map-reduce作业时,都可以检查输出的状态,计数器中包含的值等,并在控制迭代的节点上决定是否要迭代一次或多次。我想我不明白在你的场景中对全局状态的需求来自何处。
更通常的说法 - 执行节点之间共享状态有两种主要方式(尽管应该注意,共享状态最好避免,因为它限制了可扩展性)。
$ b $ ol <
写一个文件到其他节点可以读取的HDFS(确保文件在作业退出时得到清理,并且这种推测性执行不会导致奇怪的故障)。
使用ZooKeeper将一些数据存储在专用的ZK树节点。
My program follows a iterative map/reduce approach. And it needs to stop if certain conditions are met. Is there anyway i can set a global variable that can be distributed across all map/reduce tasks and check if the global variable reaches the condition for completion.
Something like this.
While(Condition != true){
Configuration conf = getConf();
Job job = new Job(conf, "Dijkstra Graph Search");
job.setJarByClass(GraphSearch.class);
job.setMapperClass(DijkstraMap.class);
job.setReducerClass(DijkstraReduce.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
}
Where condition is a global variable that is modified during/after each map/reduce execution.
解决方案 Each time you run a map-reduce job, you can examine the state of the output, the values contained in the counters, etc, and make a decision at the node that is controlling the iteration on whether you want one more iteration or not. I guess I don't understand where the need for a global state comes from in your scenario.
More generally -- there are two main ways state is shared between executing nodes (although it should be noted that sharing state is best avoided since it limits scalability).
- Write a file to HDFS that other nodes can read (make sure the file gets cleaned up when the job exits, and that speculative execution won't cause weird failures).
- Use ZooKeeper to store some data in dedicated ZK tree nodes.
这篇关于在hadoop中的全局变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!