PySpark在Eclipse:使用的PyDev [英] PySpark in Eclipse: using PyDev
问题描述
我从命令行运行本地pyspark code和它的作品:
I am running a local pyspark code from command line and it works:
/Users/edamame/local-lib/apache-spark/spark-1.5.1/bin/pyspark --jars myJar.jar --driver-class-path myJar.jar --executor-memory 2G --driver-memory 4G --executor-cores 3 /myPath/myProject.py
是否有可能从Eclipse运行使用的PyDev这个code?什么是运行配置所需的参数呢?我想,得到了以下错误:
Is it possible to run this code from Eclipse using PyDev? What are the arguments required in the Run Configuration? I tried and got the following errors:
Traceback (most recent call last):
File "/myPath/myProject.py", line 587, in <module>
main()
File "/myPath/myProject.py", line 506, in main
conf = SparkConf()
File "/Users/edamame/local-lib/apache-spark/spark-1.5.1/python/pyspark/conf.py", line 104, in __init__
SparkContext._ensure_initialized()
File "/Users/edamame/local-lib/apache-spark/spark-1.5.1/python/pyspark/context.py", line 234, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
File "/Users/edamame/local-lib/apache-spark/spark-1.5.1/python/pyspark/java_gateway.py", line 76, in launch_gateway
proc = Popen(command, stdin=PIPE, preexec_fn=preexec_func, env=env)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1308, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
任何一个是否有什么想法?非常感谢你!
Does any one have any idea? Thank you very much!
推荐答案
考虑以下prerequisites:
Considering the following prerequisites:
- Eclipse中的PyDev和Spark安装。
- 的PyDev配置了一个Python间preTER。
- 的PyDev使用Spark的Python源配置。
下面是你需要做的:
-
从Eclipse的ID,请检查你在PyDev透视图:
From Eclipse ID, Check that you are on the PyDev perspective:
- 在苹果机:月食> preferences
- 在Linux上:窗口> preferences
在 preferences 的窗口,去的PyDev>国米preters>的Python国米preTER:
From the Preferences window, go to PyDev > Interpreters > Python Interpreter:
- 单击在中央按钮[环境]
- 单击按钮[新建...]添加一个新的环境变量。
- 添加环境变量SPARK_HOME和验证:
- 名称:SPARK_HOME,值:/path/to/apache-spark/spark-1.5.1 /
- 注意:不要使用系统环境变量,如$ SPARK_HOME
- Click on the central button [Environment]
- Click on the button [New...] add a new Environment variable.
- Add the environment variable SPARK_HOME and validate:
- Name: SPARK_HOME, Value: /path/to/apache-spark/spark-1.5.1/
- Note : Don’t use the system environment variables such as $SPARK_HOME
我也建议你来处理自己的 log4j.properties
文件中的每个项目的。
I also recommend you to handle your own log4j.properties
file in each of your project.
要做到这一点,你需要添加环境变量 SPARK_CONF_DIR
为完成previously,例如:
To do so, you'll need to add the environment variable SPARK_CONF_DIR
as done previously, example:
Name: SPARK_CONF_DIR, Value: ${project_loc}/conf
如果您遇到一些问题的变量$ {} project_loc(例如:在Linux),指定一个绝对路径,而不是
If you experience some problems with the variable ${project_loc} (e.g: with Linux), specify an absolute path instead.
或者,如果你想保持 $ {project_loc}
,每个Python源,然后右键单击:运行方式>运行配置,然后创建你的 SPARK_CONF_DIR
在环境选项卡中描述$ p $变量pviously。
Or if you want to keep ${project_loc}
, right-click on every Python source and: Runs As > Run Configuration, then create your SPARK_CONF_DIR
variable in the Environment tab as described previously.
有时,您可以添加其他环境变量,例如 TERM
, SPARK_LOCAL_IP
等
Occasionally, you can add other environment variables such as TERM
, SPARK_LOCAL_IP
and so on:
- 名称:TERM,价值在Mac上:xterm的-256color,价值在Linux上:xterm的,如果你婉使用课程的xterm
- 名称:SPARK_LOCAL_IP,值:127.0.0.1(我们建议以指定真正的本地IP地址)
PS:我不记得本教程的来源,所以请原谅我没有理由的作者。我不是我自己想出这个。
PS: I don't remember the sources of this tutorial, so excuse me for not citing the author. I didn't come up with this by myself.
这篇关于PySpark在Eclipse:使用的PyDev的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!