无法理解错误"SparkListenerBus已停止!掉落事件..." [英] Unable to understand error "SparkListenerBus has already stopped! Dropping event ..."

查看:1017
本文介绍了无法理解错误"SparkListenerBus已停止!掉落事件..."的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有人有魔术方法来避免Spark日志中出现此类消息:

I'd like to know if anyone has a magic method to avoid such messages in Spark logs:

2015-08-30 19:30:44 ERROR LiveListenerBus:75 - SparkListenerBus has already
stopped! Dropping event SparkListenerExecutorMetricsUpdate(41,WrappedArray())

经过进一步调查,我了解到LiveListenerBusAsynchronousListenerBus的扩展.因此,有时会调用.stop()方法.然后,可能发送/接收的消息将被丢弃并保持未处理状态.基本上,不幸的是,某些SparkListenerExecutorMetricsUpdate消息还没有收到,一旦被接收,它们就会掉到无处.

After further investigations, I understand that LiveListenerBus extends AsynchronousListenerBus. And thus, at some point, .stop() method is called. Then, messages that might be sent/received will be dropped and remain unprocessed. Basically, some SparkListenerExecutorMetricsUpdate messages are unfortunately not received yet, and once they are, they become dropped to nowhere.

这看起来并不重要,因为SparkListenerExecutorMetricsUpdate仅与执行程序的定期更新相对应.

This doesn't look critical since SparkListenerExecutorMetricsUpdate just correspond to Periodic updates from executors.

令人尴尬的是,我绝对不理解为什么会发生这种情况,而没有任何内容涉及此问题.请注意,这完全是不确定的,我可能无法重现,这可能是由于异步特性以及我对应该如何调用/何时调用stop()的理解不足.

What is embarrassing is that I absolutely don't understand why this happens and nothings refers to this issue. Note that this is totally non-deterministic and I can't reproduce this, probably due to the asynchronous nature and my lack of understand on how/when stop() is supposed to be called.

详细样本:

val sc = new SparkContext(sparkConf)
val metricsMap = Metrics.values.toSeq.map(
    v => v -> sc.accumulator(0, v.toString)
).toMap
val outFiles = sc.textFile(outPaths)

并且没有对scSparkContent实例的其他引用.

And there's no other reference to sc or a SparkContent instance.

推荐答案

此票证可能与之相关. https://issues.apache.org/jira/browse/SPARK-12009

This ticket might be related. https://issues.apache.org/jira/browse/SPARK-12009

该消息似乎表明sparkcontext停止后纱线分配失败.

The message seems to indicate yarn allocation failure after sparkcontext stop.

抱歉,不清楚的评论.

主要原因似乎是AM的关闭事件与执行程序停止所有事件之间存在一定间隔.
因此,AM会在执行程序停止后尝试重新分配.

The main reason seems that there is some interval between AM's shutdown event and executors stop all.
So, AM tries to reallocate after executors stop.

就像Saisai在下面说的那样,

As Saisai said below,

一个有趣的事情是,AM在2015-11-26,03:05:16的时间关闭了,但是YarnAllocator在11秒后仍然请求13个执行者.看起来AM退出的速度没有那么快,这就是YarnAllocator仍在请求新容器的原因.通常,如果AM以收到断开连接消息的速度退出,则容器将没有时间请求YarnAllocator.

A interesting thing is that AM is shutting down at time 2015-11-26,03:05:16, but YarnAllocator still request 13 executors after 11 seconds. Looks like AM is not exited so fast, that's why YarnAllocator is still requesting new containers. Normally if AM is exited as fast as it receive disconnected message, there will be not time for container requesting for YarnAllocator.

我有时会在接近火花上下文时遇到类似的日志.
就我而言,这张票似乎是答案.

I have come across similar logs near finishing spark context sometimes.
In my case, this ticket seems to be answer.

这篇关于无法理解错误"SparkListenerBus已停止!掉落事件..."的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆