Hadoop推测任务执行 [英] Hadoop speculative task execution

查看:24
本文介绍了Hadoop推测任务执行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Google 的 MapReduce 论文中,他们有一个备份任务,我认为这与 Hadoop 中的推测任务是一样的.投机任务是如何实现的?当我开始一项推测性任务时,该任务是从最开始的较旧且缓慢的任务开始,还是从较旧的任务到达的地方开始(如果是这样,它是否必须复制所有中间状态和数据?)

In Google's MapReduce paper, they have a backup task, I think it's the same thing with speculative task in Hadoop. How is the speculative task implemented? When I start a speculative task, does the task start from the very begining as the older and slowly one, or just start from where the older task has reached(if so, does it have to copy all the intermediate status and data?)

推荐答案

Hadoop 系统的一个问题是,通过将任务划分到多个节点,少数慢速节点可能对程序的其余部分进行速率限制.

One problem with the Hadoop system is that by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program.

任务可能因各种原因而变慢,包括硬件性能下降或软件配置错误,但由于任务仍然成功完成,尽管比预期的时间长,因此可能难以检测原因.Hadoop 不会尝试诊断和修复运行缓慢的任务;相反,它会尝试检测某个任务的运行速度何时低于预期并启动另一个等效的任务作为备份.这称为任务的推测执行.

Tasks may be slow for various reasons, including hardware degradation, or software mis-configuration, but the causes may be hard to detect since the tasks still complete successfully, albeit after a longer time than expected. Hadoop doesn’t try to diagnose and fix slow-running tasks; instead, it tries to detect when a task is running slower than expected and launches another, equivalent, task as a backup. This is termed speculative execution of tasks.

例如,如果一个节点的磁盘控制器速度较慢,那么它读取输入的速度可能仅为所有其他节点的 10%.所以当99个map任务已经完成时,系统还在等待最后一个map任务check in,这比其他所有节点花费的时间要长得多.

For example if one node has a slow disk controller, then it may be reading its input at only 10% the speed of all the other nodes. So when 99 map tasks are already complete, the system is still waiting for the final map task to check in, which takes much longer than all the other nodes.

通过强制任务彼此隔离运行,单个任务不知道它们的输入来自哪里.任务相信 Hadoop 平台只提供适当的输入.因此,可以并行处理相同的输入多次,以利用机器能力的差异.由于作业中的大多数任务即将结束,Hadoop 平台将在多个没有其他工作要执行的节点上安排剩余任务的冗余副本.此过程称为推测执行.当任务完成时,他们向 JobTracker 宣布这一事实.无论任务的哪个副本先完成,都将成为最终副本.如果其他副本正在推测性执行,Hadoop 会告诉 TaskTracker 放弃任务并放弃它们的输出.然后,Reducer 首先从成功完成的 Mapper 接收输入.

By forcing tasks to run in isolation from one another, individual tasks do not know where their inputs come from. Tasks trust the Hadoop platform to just deliver the appropriate input. Therefore, the same input can be processed multiple times in parallel, to exploit differences in machine capabilities. As most of the tasks in a job are coming to a close, the Hadoop platform will schedule redundant copies of the remaining tasks across several nodes which do not have other work to perform. This process is known as speculative execution. When tasks complete, they announce this fact to the JobTracker. Whichever copy of a task finishes first becomes the definitive copy. If other copies were executing speculatively, Hadoop tells the TaskTrackers to abandon the tasks and discard their outputs. The Reducers then receive their inputs from whichever Mapper completed successfully, first.

默认情况下启用推测执行.您可以通过将 mapred.map.tasks.speculative.executionmapred.reduce.tasks.speculative.execution JobConf 选项设置为 false 来禁用映射器和化简器的推测执行,分别使用旧 API,而使用新 API,您可以考虑更改 mapreduce.map.speculativemapreduce.reduce.speculative.

Speculative execution is enabled by default. You can disable speculative execution for the mappers and reducers by setting the mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution JobConf options to false, respectively using old API, while with newer API you may consider changing mapreduce.map.speculative and mapreduce.reduce.speculative.

所以要回答你的问题,它确实重新开始,与其他任务完成/完成了多少无关.

So to answer your question it does start afresh and has nothing to do with how much the other task has done/completed.

参考:http://developer.yahoo.com/hadoop/tutorial/module4.html

这篇关于Hadoop推测任务执行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆