使用星火间歇超时异常 [英] Intermittent Timeout Exception using Spark

查看:136
本文介绍了使用星火间歇超时异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经与10个节点的集群星火,而我使用的是星火背景信息后的第一时间得到这个异​​常:

I've a Spark cluster with 10 nodes, and I'm getting this exception after using the Spark Context for the first time:

14/11/20 11:15:13 ERROR UserGroupInformation: PriviledgedActionException as:iuberdata (auth:SIMPLE) cause:java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1421)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:156)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.security.PrivilegedActionException: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    ... 4 more

家伙有过类似的问题,但我已经尽了自己的解决方案,并没有奏效。

This guy have had a similar problem but I've already tried his solution and didn't worked.

同样的例外也恰好<一个href=\"http://mail-archives.apache.org/mod_mbox/spark-user/201409.mbox/%3CCAMc-71miSp8r-p1tnxEjharnRTUhf1RBtLu2aLzyhbgA042-0A@mail.gmail.com%3E\">here但问题是他们不一样在这里因为我在这两个主或从并在客户端使用的火花1.1.0版。

The same exception also happens here but the problem isn't them same in here as I'm using spark version 1.1.0 in both master or slave and in client.

我试图增加超时到120秒,但它仍然没有解决问题。

I've tried to increase the timeout to 120s but it still doesn't solve the problem.

我doploying throught脚本环境,我使用context.addJar,包括我的code到类路径中。
这个问题是有间歇的,我没有对如何跟踪它为什么发生任何想法。配置火花集群时,任何人都面临着这个问题知道如何解决它?

I'm doploying the environment throught scripts and I'm using the context.addJar to include my code into the classpath. This problem is intermittend, and I don't have any idea on how to track why is it happening. Anybody has faced this issue when configuring a spark cluster know how to solve it?

推荐答案

防火墙被missconfigured和,在某些情况下,它不允许的从站连接到群集。
这产生的超时问题,因为奴隶们无法连接到服务器。
如果你正面临此超时,请检查你的防火墙CONFIGS。

The Firewall was missconfigured and, in some instances, it didn't allowed the slaves to connect to the cluster. This generated the timeout issue, as the slaves couldn't connect to the server. If you are facing this timeout, check your firewall configs.

这篇关于使用星火间歇超时异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆