Spark 作业在显示所有作业完成后重新启动，然后失败(TimeoutException:Futures 在 [300 秒] 后超时) [英] Spark job restarted after showing all jobs completed and then fails (TimeoutException: Futures timed out after [300 seconds])

查看：32 发布时间：2021/11/14 21:57:11 scala apache-spark apache-spark-sql spark-dataframe

本文介绍了Spark 作业在显示所有作业完成后重新启动，然后失败(TimeoutException:Futures 在 [300 秒] 后超时)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在执行一项 Spark 工作.它显示所有作业都已完成:

I'm running a spark job. It shows that all of the jobs were completed:

然而，几分钟后整个作业重新启动，这次它会显示所有作业和任务也已完成，但几分钟后它会失败.我在日志中发现了这个异常:

however after couple of minutes the entire job restarts, this time it will show all jobs and tasks were completed too, but after couple of minutes it will fail. I found this exception in the logs:

java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]

所以当我尝试加入 2 个相当大的表时会发生这种情况:3B 行之一，第二个是 200M 行，当我在结果数据帧上运行 show(100) 时，一切得到评估，我遇到了这个问题.

So this happens when I'm trying to join 2 pretty big tables: one of 3B rows, and the second is 200M rows, when I run show(100) on the resulting dataframe, everything gets evaluated and I'm getting this issue.

我尝试增加/减少分区数，我将垃圾收集器更改为 G1，增加了线程数.我将 spark.sql.broadcastTimeout 更改为 600(这使得超时消息更改为 600 秒).

I tried playing around with increasing/decreasing the number of partitions, I changed the garbage collector to G1 with increased number of threads. I changed spark.sql.broadcastTimeout to 600 (which made the time out message to change to 600 seconds).

我还读到这可能是一个通信问题，但是在此代码段之前运行的其他 show() 子句没有问题，所以可能不是这样.

I also read that this might be a communication issue, however other show() clauses that run prior this code segment work without problems, so it's probably not it.

这是提交命令:

/opt/spark/spark-1.4.1-bin-hadoop2.3/bin/spark-submit  --master yarn-cluster --class className --executor-memory 12g --executor-cores 2 --driver-memory 32g --driver-cores 8 --num-executors 40 --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:ConcGCThreads=20" /home/asdf/fileName-assembly-1.0.jar

您可以了解 Spark 版本以及从那里使用的资源.

you can get the idea about spark versions, and the resources used from there.

我该往哪里去?任何帮助将不胜感激，如果需要，将提供代码段/附加日志记录.

Where do I go from here? Any help will be appreciated, and code segments/additional logging will be provided if needed.

Spark 作业在显示所有作业完成后重新启动，然后失败(TimeoutException:Futures 在 [300 秒] 后超时) [英] Spark job restarted after showing all jobs completed and then fails (TimeoutException: Futures timed out after [300 seconds])

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark 作业在显示所有作业完成后重新启动，然后失败(TimeoutException:Futures 在 [300 秒] 后超时) [英] Spark job restarted after showing all jobs completed and then fails (TimeoutException: Futures timed out after [300 seconds])

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭