Spark-submit 找不到本地文件 [英] Spark-submit can't locate local file

查看:42
本文介绍了Spark-submit 找不到本地文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经编写了一个非常简单的 python 脚本来测试我的 Spark 流创意,并计划在我的本地机器上运行它以稍微搞砸一下.这是命令行:

I've written a very simple python script for testing my spark streaming idea, and plan to run it on my local machine to mess around a little bit. Here is the command line:

spark-submit spark_streaming.py localhost 9999

但是终端给我一个错误:

But the terminal threw me an error:

Error executing Jupyter command '<the/spark_streaming.py/file/path>': [Errno 2] No such file or directory

我不知道为什么会发生这种情况,而且我确定 .py 文件确实存在.

I have no idea why this would happen, and I'm sure the .py file does exist.

使用 python 而不是 spark-submit

此外,.bashrc 文件中添加的行:

And also, the lines added in the .bashrc file:

export PATH="/usr/local/spark/bin:$PATH"
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
export SPARK_LOCAL_IP=localhost

推荐答案

假设您希望 spark-submit 将位于 /home/user/scripts/spark_streaming 的 Python 脚本 YARN.py,正确的语法如下:

Supposing you want to spark-submit to YARN a Python script located at /home/user/scripts/spark_streaming.py, the correct syntax is as follows:

spark-submit --master yarn --deploy-mode client /home/user/scripts/spark_streaming.py

各种标志的顺序可以互换,但脚本本身必须在最后;如果您的脚本接受参数,则它们应遵循脚本名称(例如,请参阅 这个例子 用 10 位十进制数字计算 pi).

You can interchange the ordering of the various flags, but the script itself must be at the end; if your script accepts arguments, they should follow the script name (e.g. see this example for calculating pi with 10 decimal digits).

对于使用 2 个内核在本地执行,您应该使用 --master local[2] - 对所有可用的本地使用 --master local[*]核心(在两种情况下都没有 deploy-mode 标志).

For executing locally with, say, 2 cores, you should use --master local[2] - use --master local[*] for all available local cores (no deploy-mode flag in both cases).

查看 docs 以获取更多信息(尽管不可否认他们在 pyspark 演示中相当糟糕).

Check the docs for more info (although admittedly they are rather poor in pyspark demonstrations).

PS 提到Jupyter,以及错误信息中显示的路径非常令人费解...

PS The mention of Jupyter, as well the path shown in your error message are extremely puzzling...

UPDATE:似乎 PYSPARK_DRIVER_PYTHON=jupyter 搞砸了一切,通过 Jupyter 漏斗执行(这在这里是不可取的,它可能解释了奇怪的错误消息).尝试修改 .bashrc 中的环境变量,如下所示:

UPDATE: Seems that PYSPARK_DRIVER_PYTHON=jupyter messes up everything, funneling the execution through Jupyter (which is undesirable here, and it may explain the weird error message). Try modifying the environment variables in your .bashrc as follows:

export SPARK_HOME="/usr/local/spark"  # do not include /bin
export PYSPARK_PYTHON=python
export PYSPARK_DRIVER_PYTHON=python
export PYSPARK_DRIVER_PYTHON_OPTS=""

源.bashrc.

这篇关于Spark-submit 找不到本地文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆