Spark提交无法找到本地文件 [英] Spark-submit can't locate local file

查看:460
本文介绍了Spark提交无法找到本地文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经编写了一个非常简单的python脚本来测试我的Spark Streaming想法,并计划在我的本地计算机上运行它,以使其变得混乱.这是命令行:

I've written a very simple python script for testing my spark streaming idea, and plan to run it on my local machine to mess around a little bit. Here is the command line:

spark-submit spark_streaming.py localhost 9999

但是终端向我抛出了一个错误:

But the terminal threw me an error:

Error executing Jupyter command '<the/spark_streaming.py/file/path>': [Errno 2] No such file or directory

我不知道为什么会发生这种情况,而且我确定.py文件确实存在.

I have no idea why this would happen, and I'm sure the .py file does exist.

使用python而不是spark-submit

此外,.bashrc文件中添加的行:

And also, the lines added in the .bashrc file:

export PATH="/usr/local/spark/bin:$PATH"
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
export SPARK_LOCAL_IP=localhost

推荐答案

假设您要spark-submit来获取位于/home/user/scripts/spark_streaming.py的Python脚本,正确的语法如下:

Supposing you want to spark-submit to YARN a Python script located at /home/user/scripts/spark_streaming.py, the correct syntax is as follows:

spark-submit --master yarn --deploy-mode client /home/user/scripts/spark_streaming.py

您可以互换各种标志的顺序,但是脚本本身必须在末尾;如果您的脚本接受参数,则它们应遵循脚本名称(例如,参见

You can interchange the ordering of the various flags, but the script itself must be at the end; if your script accepts arguments, they should follow the script name (e.g. see this example for calculating pi with 10 decimal digits).

例如要使用2个内核在本地执行,应使用--master local[2]-将--master local[*]用于所有可用的本地内核(在两种情况下均不使用deploy-mode标志).

For executing locally with, say, 2 cores, you should use --master local[2] - use --master local[*] for all available local cores (no deploy-mode flag in both cases).

检查文档了解更多信息(尽管坦率地说他们在pyspark示威活动中表现不佳.

Check the docs for more info (although admittedly they are rather poor in pyspark demonstrations).

PS Jupyter的提及以及错误消息中显示的路径非常令人困惑...

PS The mention of Jupyter, as well the path shown in your error message are extremely puzzling...

更新:似乎PYSPARK_DRIVER_PYTHON=jupyter弄乱了所有内容,通过Jupyter将执行漏斗了(这在这里是不希望的,并且可能解释奇怪的错误消息).尝试如下修改.bashrc中的环境变量:

UPDATE: Seems that PYSPARK_DRIVER_PYTHON=jupyter messes up everything, funneling the execution through Jupyter (which is undesirable here, and it may explain the weird error message). Try modifying the environment variables in your .bashrc as follows:

export SPARK_HOME="/usr/local/spark"  # do not include /bin
export PYSPARK_PYTHON=python
export PYSPARK_DRIVER_PYTHON=python
export PYSPARK_DRIVER_PYTHON_OPTS=""

source .bashrc.

这篇关于Spark提交无法找到本地文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆