Spark任务停留在RUNNING [英] Spark tasks stuck at RUNNING

查看：860 发布时间：2020/9/4 3:03:22 apache-spark

本文介绍了Spark任务停留在RUNNING的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在我的Yarn集群上运行Spark ML管道(从JDBC加载一些数据，运行一些转换器，训练模型)，但是每次运行它时，都会有几个-有时是一个，有时是3或4-我的执行者卡在运行第一个任务集(三个内核中的每个内核要执行3个任务)时遇到了麻烦，而其余的则正常运行，一次检查了3个.

I'm trying to run a Spark ML pipeline (load some data from JDBC, run some transformers, train a model) on my Yarn cluster but each time I run it, a couple - sometimes one, sometimes 3 or 4 - of my executors get stuck running their first task set (that'd be 3 tasks for each of their 3 cores), while the rest run normally, checking off 3 at a time.

在用户界面中，您会看到以下内容:

In the UI, you'd see something like this:

到目前为止，我已经观察到一些东西:

Some things I have observed so far:

当我将执行程序设置为每个执行程序分别使用1个内核和spark.executor.cores时(即一次运行1个任务)，就不会出现此问题；
被困住的执行者似乎总是为了执行任务而不得不改组一些分区的人；
被卡住的任务最终将由另一个实例成功地以推测方式执行；
有时，单个任务会卡在正常情况下的执行器中，但是其他两个内核仍可以正常工作；
卡住的执行器实例看起来一切正常:CPU处于〜100％，有足够的可用内存，JVM进程处于活动状态，Spark或Yarn均未记录任何异常情况，并且它们仍可以接收来自驱动程序，例如放弃此任务，其他人已经推测性地执行了该任务"，尽管出于某些原因，他们不要放弃它;
那些执行者永远不会被驾驶员杀死，所以我想他们会不断发送自己的心跳信号；

When I set up my executors to use 1 core each with spark.executor.cores (i.e. run 1 task at a time), the issue does not occur;
The stuck executors always seem to be them ones that had to get some partitions shuffled to them in order to run the task;
The stuck tasks would ultimately get successfully speculatively executed by another instance;
Occasionally, a single task would get stuck in an executor that is otherwise normal, the other 2 cores would keep working fine, however;
The stuck executor instances look like everything is normal: CPU is at ~100%, plenty of memory to spare, the JVM processes are alive, neither Spark or Yarn log anything out of the ordinary and they can still receive instructions from the driver, such as "drop this task, someone else speculatively executed it already" -- though, for some reason, they don't drop it;
Those executors never get killed off by the driver, so I imagine they keep sending their heartbeats just fine;

关于什么可能导致这种情况或我应该尝试什么的任何想法?

Any ideas as to what may be causing this or what I should try?

Spark任务停留在RUNNING [英] Spark tasks stuck at RUNNING

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark任务停留在RUNNING [英] Spark tasks stuck at RUNNING

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭