在hadoop中的全局变量 [英] Global variables in hadoop

查看:201
本文介绍了在hadoop中的全局变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的程序遵循迭代映射/缩减方法。如果满足某些条件,则需要停止。无论如何,我可以设置一个全局变量,它可以分布在所有map / reduce任务中,并检查全局变量是否达到完成条件。



类似这样的。

  While(Condition!= true){

Configuration conf = getConf();
工作职位=新职位(conf,Dijkstra图表搜索);

job.setJarByClass(GraphSearch.class);
job.setMapperClass(DijkstraMap.class);
job.setReducerClass(DijkstraReduce.class);

job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);






$ b其中,condition是一个全局变量,每个映射/减少执行。

解决方案

每次运行map-reduce作业时,都可以检查输出的状态,计数器中包含的值等,并在控制迭代的节点上决定是否要迭代一次或多次。我想我不明白在你的场景中对全局状态的需求来自何处。

更通常的说法 - 执行节点之间共享状态有两种主要方式(尽管应该注意,共享状态最好避免,因为它限制了可扩展性)。
$ b $ ol <
  • 写一个文件到其他节点可以读取的HDFS(确保文件在作业退出时得到清理,并且这种推测性执行不会导致奇怪的故障)。

  • 使用ZooKeeper将一些数据存储在专用的ZK树节点。


  • My program follows a iterative map/reduce approach. And it needs to stop if certain conditions are met. Is there anyway i can set a global variable that can be distributed across all map/reduce tasks and check if the global variable reaches the condition for completion.

    Something like this.

    While(Condition != true){
    
                Configuration conf = getConf();
                Job job = new Job(conf, "Dijkstra Graph Search");
    
                job.setJarByClass(GraphSearch.class);
                job.setMapperClass(DijkstraMap.class);
                job.setReducerClass(DijkstraReduce.class);
    
                job.setOutputKeyClass(IntWritable.class);
                job.setOutputValueClass(Text.class);
    
    }
    

    Where condition is a global variable that is modified during/after each map/reduce execution.

    解决方案

    Each time you run a map-reduce job, you can examine the state of the output, the values contained in the counters, etc, and make a decision at the node that is controlling the iteration on whether you want one more iteration or not. I guess I don't understand where the need for a global state comes from in your scenario.

    More generally -- there are two main ways state is shared between executing nodes (although it should be noted that sharing state is best avoided since it limits scalability).

    1. Write a file to HDFS that other nodes can read (make sure the file gets cleaned up when the job exits, and that speculative execution won't cause weird failures).
    2. Use ZooKeeper to store some data in dedicated ZK tree nodes.

    这篇关于在hadoop中的全局变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆