PySpark在Eclipse:使用的PyDev [英] PySpark in Eclipse: using PyDev

查看:2124
本文介绍了PySpark在Eclipse:使用的PyDev的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从命令行运行本地pyspark code和它的作品:

I am running a local pyspark code from command line and it works:

/Users/edamame/local-lib/apache-spark/spark-1.5.1/bin/pyspark --jars myJar.jar --driver-class-path myJar.jar --executor-memory 2G --driver-memory 4G --executor-cores 3 /myPath/myProject.py

是否有可能从Eclipse运行使用的PyDev这个code?什么是运行配置所需的参数呢?我想,得到了以下错误:

Is it possible to run this code from Eclipse using PyDev? What are the arguments required in the Run Configuration? I tried and got the following errors:

Traceback (most recent call last):
  File "/myPath/myProject.py", line 587, in <module>
    main()
  File "/myPath/myProject.py", line 506, in main
    conf = SparkConf()
  File "/Users/edamame/local-lib/apache-spark/spark-1.5.1/python/pyspark/conf.py", line 104, in __init__
    SparkContext._ensure_initialized()
  File "/Users/edamame/local-lib/apache-spark/spark-1.5.1/python/pyspark/context.py", line 234, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/Users/edamame/local-lib/apache-spark/spark-1.5.1/python/pyspark/java_gateway.py", line 76, in launch_gateway
    proc = Popen(command, stdin=PIPE, preexec_fn=preexec_func, env=env)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1308, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

任何一个是否有什么想法?非常感谢你!

Does any one have any idea? Thank you very much!

推荐答案

考虑以下prerequisites:

Considering the following prerequisites:


  • Eclipse中的PyDev和Spark安装。

  • 的PyDev配置了一个Python间preTER。

  • 的PyDev使用Spark的Python源配置。

下面是你需要做的:


  • 从Eclipse的ID,请检查你在PyDev透视图:

  • From Eclipse ID, Check that you are on the PyDev perspective:


  • 在苹果机:月食> preferences

  • 在Linux上:窗口> preferences

preferences 的窗口,去的PyDev>国米preters>的Python国米preTER:

From the Preferences window, go to PyDev > Interpreters > Python Interpreter:


  • 单击在中央按钮[环境]

  • 单击按钮[新建...]添加一个新的环境变量。

  • 添加环境变量SPARK_HOME和验证:

  • 名称:SPARK_HOME,值:/path/to/apache-spark/spark-1.5.1 /

  • 注意:不要使用系统环境变量,如$ SPARK_HOME

  • Click on the central button [Environment]
  • Click on the button [New...] add a new Environment variable.
  • Add the environment variable SPARK_HOME and validate:
  • Name: SPARK_HOME, Value: /path/to/apache-spark/spark-1.5.1/
  • Note : Don’t use the system environment variables such as $SPARK_HOME

我也建议你来处理自己的 log4j.properties 文件中的每个项目的。

I also recommend you to handle your own log4j.properties file in each of your project.

要做到这一点,你需要添加环境变量 SPARK_CONF_DIR 为完成previously,例如:

To do so, you'll need to add the environment variable SPARK_CONF_DIR as done previously, example:

Name: SPARK_CONF_DIR, Value: ${project_loc}/conf

如果您遇到一些问题的变量$ {}​​ project_loc(例如:在Linux),指定一个绝对路径,而不是

If you experience some problems with the variable ${project_loc} (e.g: with Linux), specify an absolute path instead.

或者,如果你想保持 $ {project_loc} ,每个Python源,然后右键单击:运行方式>运行配置,然后创建你的 SPARK_CONF_DIR 在环境选项卡中描述$ p $变量pviously。

Or if you want to keep ${project_loc}, right-click on every Python source and: Runs As > Run Configuration, then create your SPARK_CONF_DIR variable in the Environment tab as described previously.

有时,您可以添加其他环境变量,例如 TERM SPARK_LOCAL_IP

Occasionally, you can add other environment variables such as TERM, SPARK_LOCAL_IP and so on:


  • 名称:TERM,价值在Mac上:xterm的-256color,价值在Linux上:xterm的,如果你婉使用课程的xterm

  • 名称:SPARK_LOCAL_IP,值:127.0.0.1(我们建议以指定真正的本地IP地址)

PS:我不记得本教程的来源,所以请原谅我没有理由的作者。我不是我自己想出这个。

PS: I don't remember the sources of this tutorial, so excuse me for not citing the author. I didn't come up with this by myself.

这篇关于PySpark在Eclipse:使用的PyDev的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆