在集群上运行 Spark:初始作业尚未接受任何资源 [英] Runnning Spark on cluster: Initial job has not accepted any resources

查看:29
本文介绍了在集群上运行 Spark:初始作业尚未接受任何资源的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  1. 我在 linode.com 上有一个远程 Ubuntu 服务器,有 4 个内核和 8G RAM
  2. 我的远程 Ubuntu 服务器上有一个 Spark-2 集群,由 1 个主服务器和 1 个从服务器组成.
  3. 我已经在我的 MacBook 本地启动了 PySpark shell,通过以下方式连接到远程服务器上的主节点:

  1. I have a remote Ubuntu server on linode.com with 4 cores and 8G RAM
  2. I have a Spark-2 cluster consisting of 1 master and 1 slave on my remote Ubuntu server.
  3. I have started PySpark shell locally on my MacBook, connected to my master node on remote server by:

$ PYSPARK_PYTHON=python3 /vagrant/spark-2.0.0-bin-hadoop2.7/bin/pyspark --master spark://[server-ip]:7077

  • 我尝试从网站执行简单的 Spark 示例:

  • I tried executing simple Spark example from website:

    from pyspark.sql import SparkSession
    
    spark = SparkSession \
        .builder \
        .appName("Python Spark SQL basic example") \
        .config("spark.some.config.option", "some-value") \
        .getOrCreate()
    df = spark.read.json("/path/to/spark-2.0.0-bin-hadoop2.7/examples/src/main/resources/people.json")
    

  • 我有错误

  • I have got error

    初始作业没有接受任何资源;检查您的集群 UI确保工人已注册并拥有足够的资源

    Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

  • 我的服务器和本地机器上都有足够的内存,但我一次又一次地收到这个奇怪的错误.我的 Spark 集群有 6G,我的脚本仅使用 4 个内核,每个节点有 1G 内存.

  • I have enough memory on my server and also on my local machine, but I am getting this weird error again and again. I have 6G for my Spark cluster, my script is using only 4 cores with 1G memory per node.

    [

    我在谷歌上搜索过这个错误并尝试设置不同的内存配置,还在两台机器上禁用了防火墙,但这对我没有帮助.我不知道如何修复它.

    I have Googled for this error and tried to setup different memory configs, also disabled firewall on both machines, but it does not helped me. I have no idea how to fix it.

    有人遇到同样的问题吗?有什么想法吗?

    Is someone faced the same problem? Any ideas?

    推荐答案

    您正在以客户端模式提交申请.这意味着驱动程序进程已在您的本地计算机上启动.

    You are submitting application in the client mode. It means that driver process is started on your local machine.

    在执行 Spark 应用程序时,所有机器都必须能够相互通信.很可能您的驱动程序进程无法从执行程序访问(例如,它使用私有 IP 或隐藏在防火墙后面).如果是这种情况,您可以通过检查执行程序日志来确认(转到应用程序,选择状态为 EXITED 的工作人员并检查 stderr.您应该"看到由于 org.apache.spark.rpc.RpcTimeoutException,该执行程序失败.

    When executing Spark applications all machines have to be able to communicate with each other. Most likely your driver process is not reachable from the executors (for example it is using private IP or is hidden behind firewall). If that is the case you can confirm that by checking executor logs (go to application, select on of the workers with the status EXITED and check stderr. You "should" see that executor is failing due to org.apache.spark.rpc.RpcTimeoutException).

    有两种可能的解决方案:

    There are two possible solutions:

    • 从集群可访问的机器提交申请.
    • 以集群模式提交申请.这将使用集群资源来启动驱动程序进程,因此您必须考虑到这一点.

    这篇关于在集群上运行 Spark:初始作业尚未接受任何资源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    相关文章
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆