显示所有作业完成后，Spark作业重新启动，然后失败(TimeoutException:[300秒]之后，期货超时) [英] Spark job restarted after showing all jobs completed and then fails (TimeoutException: Futures timed out after [300 seconds])

查看：161 发布时间：2020/9/4 3:19:25 scala apache-spark apache-spark-sql spark-dataframe

本文介绍了显示所有作业完成后，Spark作业重新启动，然后失败(TimeoutException:[300秒]之后，期货超时)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在从事火花工作.它显示所有工作均已完成:

I'm running a spark job. It shows that all of the jobs were completed:

不过，几分钟后，整个作业将重新启动，这一次它将显示所有作业和任务也已完成，但是几分钟后，它将失败. 我在日志中发现了此异常:

however after couple of minutes the entire job restarts, this time it will show all jobs and tasks were completed too, but after couple of minutes it will fail. I found this exception in the logs:

java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]

因此，当我尝试连接2个非常大的表时就会发生这种情况:3B行之一，第二行为200M行，当我在结果数据帧上运行show(100)时，所有内容都经过评估，而我得到了这个问题.

So this happens when I'm trying to join 2 pretty big tables: one of 3B rows, and the second is 200M rows, when I run show(100) on the resulting dataframe, everything gets evaluated and I'm getting this issue.

我尝试增加/减少分区数，然后通过增加线程数将垃圾回收器更改为G1.我将spark.sql.broadcastTimeout更改为600(这使超时消息更改为600秒).

I tried playing around with increasing/decreasing the number of partitions, I changed the garbage collector to G1 with increased number of threads. I changed spark.sql.broadcastTimeout to 600 (which made the time out message to change to 600 seconds).

我还读到这可能是一个通信问题，但是在此代码段之前运行的其他show()子句可以正常工作，所以可能不是.

I also read that this might be a communication issue, however other show() clauses that run prior this code segment work without problems, so it's probably not it.

这是Submit命令:

This is the submit command:

/opt/spark/spark-1.4.1-bin-hadoop2.3/bin/spark-submit  --master yarn-cluster --class className --executor-memory 12g --executor-cores 2 --driver-memory 32g --driver-cores 8 --num-executors 40 --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:ConcGCThreads=20" /home/asdf/fileName-assembly-1.0.jar

您可以了解有关Spark版本以及从那里使用的资源的想法.

you can get the idea about spark versions, and the resources used from there.

我从这里去哪里?我们将不胜感激，如有需要，还将提供代码段/其他日志记录.

Where do I go from here? Any help will be appreciated, and code segments/additional logging will be provided if needed.

显示所有作业完成后，Spark作业重新启动，然后失败(TimeoutException:[300秒]之后，期货超时) [英] Spark job restarted after showing all jobs completed and then fails (TimeoutException: Futures timed out after [300 seconds])

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

显示所有作业完成后，Spark作业重新启动，然后失败(TimeoutException:[300秒]之后，期货超时) [英] Spark job restarted after showing all jobs completed and then fails (TimeoutException: Futures timed out after [300 seconds])

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭