群集上运行Spark:初始作业尚未接受任何资源 [英] Runnning Spark on cluster: Initial job has not accepted any resources

查看:237
本文介绍了群集上运行Spark:初始作业尚未接受任何资源的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


  1. 我在 linode.com 上有一个远程Ubuntu服务器,具有4个内核和8G RAM

  2. 我的远程Ubuntu服务器上有一个包含1个主控和1个从机的Spark-2集群。

  3. 我已经在我的MacBook上启动了PySpark shell ,通过以下方式连接到远程服务器上的我的主节点:

  1. I have a remote Ubuntu server on linode.com with 4 cores and 8G RAM
  2. I have a Spark-2 cluster containg 1 master and 1 slave on my remote Ubuntu server.
  3. I have started PySpark shell locally on my MacBook, connected to my master node on remote server by:

$ PYSPARK_PYTHON = python3 /vagrant/spark-2.0 .0-bin-hadoop2.7 / bin / pyspark --master spark:// [server-ip]:7077

4.我尝试执行简单的Spark示例网站:

$ PYSPARK_PYTHON=python3 /vagrant/spark-2.0.0-bin-hadoop2.7/bin/pyspark --master spark://[server-ip]:7077 4. I tried executing simple Spark example from website:

from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .config("spark.some.config.option", "some-value") \
    .getOrCreate()
df = spark.read.json("/path/to/spark-2.0.0-bin-hadoop2.7/examples/src/main/resources/people.json")




  1. 我有错误




初始作业没有接受任何资源;检查您的集群UI到
确保工作人员已注册并拥有足够的资源

Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources




    我的服务器和我的本地机器上都有足够的内存,但是我一次又一次地发现这个奇怪的错误。我的Spark集群有6G,我的脚本只使用4个内核,每个节点有1G内存。

Spark管理屏幕截图


  1. 我有Google的这个错误,并尝试设置不同的内存配置,也禁用了两台机器上的防火墙,但并没有帮助我。我不知道如何解决它。

  1. I have Googled for this error and tried to setup different memory configs, also disabled firewall on both machines, but it does not helped me. I have no idea how to fix it.

有人面临同样的问题吗?任何想法?

Is someone faced the same problem? Any ideas?


推荐答案

您正在以客户端模式提交应用程序。这意味着驱动程序进程在本地计算机上启动。

You are submitting application in the client mode. It means that driver process is started on your local machine.

执行Spark应用程序时,所有计算机必须能够相互通信。很可能您的驱动程序进程无法从执行程序访问(例如使用私有IP或隐藏在防火墙之后)。如果是这种情况,您可以通过检查执行器日志来确认(转到应用程序,选择状态为 EXITED 的工作人员,并检查 stderr 。应该看到执行者因为 org.apache.spark.rpc.RpcTimeoutException )失败。

When executing Spark applications all machines have to be able to communicate with each other. Most likely your driver process is not reachable from the executors (for example it is using private IP or is hidden behind firewall). If that is the case you can confirm that by checking executor logs (go to application, select on of the workers with the status EXITED and check stderr. You "should" see that executor is failing due to org.apache.spark.rpc.RpcTimeoutException).

有两种可能的解决方案:

There are two possible solutions:


  • 从集群中可以访问的机器提交应用程序。 li>
  • 以集群模式提交应用程序。这将使用集群资源来启动驱动程序进程,所以你必须考虑到这一点。

这篇关于群集上运行Spark:初始作业尚未接受任何资源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆