配置单元查询显示几乎没有减少减速器,但查询仍在运行.输出正确吗? [英] Hive query shows few reducers killed but query is still running. Will the output be proper?

查看:141
本文介绍了配置单元查询显示几乎没有减少减速器,但查询仍在运行.输出正确吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个复杂的查询,其中有多个左外部联接在过去的1小时内在Amazon AWS EMR中运行.但是很少有减速器显示为失败并杀死".

I have a complex query with multiple left outer joins running for the last 1 hour in Amazon AWS EMR. But few reducers are shown as Failed and Killed.

我的问题是为什么一些减速器会被杀死?最终输出是否正确?

My question is why do some reducers get killed? Will the final output be proper?

推荐答案

通常,每个容器在最终失败之前都会进行3次尝试(可配置,如@rbyndoor所述).如果一次尝试失败,它将重新启动,直到尝试次数达到限制为止;如果失败,则整个顶点失败,所有其他任务被终止.

Usually each container has 3 attempts before final fail (configurable, as @rbyndoor mentioned). If one attempt has failed, it is being restarted until the number of attempts reaches limit, and if it is failed, the whole vertex is failed, all other tasks being killed.

某些任务尝试的罕见失败并不是那么关键的问题,尤其是在带有现货节点的EMR集群上运行时,可以在执行过程中将其删除,从而导致某些顶点的失败和部分重启.

Rare failures of some task attempts is not so critical issue, especially when running on EMR cluster with spot nodes, which can be removed during execution, causing failures and partial restarts of some vertices.

在大多数情况下,您可以在跟踪器日志中找到失败的原因.

In most cases the reason of failures you can find in tracker logs.

当然,这不是切换到已弃用的MR的原因.尝试找出根本原因并加以解决.

And of course this is not the reason to switch to the deprecated MR. Try to find what is the root cause and fix it.

在某些情况下,即使尝试失败的作业成功完成,生成的数据也可能会部分损坏.例如,在distribution by子句中使用某些非确定性函数时.就像rand().在这种情况下,重新启动的容器可能会尝试复制上一步(映射器)生成的数据,并且具有映射器结果的竞价型节点已被删除.在这种情况下,某些先前步骤的容器将重新启动,但是由于rand函数的不确定性,因此生成的数据可能会有所不同.

In some marginal cases when even if the job with some failed attempts succeeded, the data produced may be partially corrupted. For example when using some non-deterministic function in the distribute by clause. Like rand(). In this case restarted container may try to copy data produced by previous step (mapper), and the spot node with mapper results is already removed. In such case some previous step containers are restarted, but the data produced may be different because of non-deterministic nature of rand function.

关于被杀死的任务.

由于许多原因,映射器或缩减器可能会被杀死.首先,当其中一个容器完全失败时,所有其他正在运行的任务都将被杀死.如果打开了推测性执行,则重复的任务将被杀死,如果该任务很长时间没有响应,等等.这是很正常的现象,通常并不表示出现了问题.如果整个作业失败或您尝试失败的次数很多,则需要检查失败的任务日志以查找原因,而不是查明原因.

Mappers or reducers can be killed because of many reasons. First of all when one of the containers has failed completely, all other tasks running are being killed. If speculative execution is switched on, duplicated tasks are killed, if the task is not responding for a long time, etc. This is quite normal and usually is not an indicator that something is wrong. If the whole job has failed or you have many attempts failures, you need to inspect failed tasks logs to find the reason, not killed ones.

这篇关于配置单元查询显示几乎没有减少减速器,但查询仍在运行.输出正确吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆