在Intellij中运行pyspark代码 [英] Running pyspark code in Intellij

查看:226
本文介绍了在Intellij中运行pyspark代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已按照以下步骤在intellij中设置 pyspark

I have followed the steps to set up pyspark in intellij from this question:

在IntelliJ IDEA中编写并运行pyspark

以下是试图运行的简单代码:

Here is the simple code attempted to run:

#!/usr/bin/env python
from pyspark import *

def p(msg): print("%s\n" %repr(msg))

import numpy as np
a = np.array([[1,2,3], [4,5,6]])
p(a)

import os
sc = SparkContext("local","ptest",conf=SparkConf().setAppName("x"))

ardd = sc.parallelize(a)
p(ardd.collect())

以下是提交代码的结果

NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Error: Must specify a primary resource (JAR or Python or R file)
Run with --help for usage help or --verbose for debug output
Traceback (most recent call last):
  File "/git/misc/python/ptest.py", line 14, in <module>
    sc = SparkContext("local","ptest",SparkConf().setAppName("x"))
  File "/shared/spark16/python/pyspark/conf.py", line 104, in __init__
    SparkContext._ensure_initialized()
  File "/shared/spark16/python/pyspark/context.py", line 245, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/shared/spark16/python/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number

但是我真的不明白 如何 这可能会起作用:为了在 Spark 中运行,代码需要捆绑并通过<$提交C $ C>火花提交。

However I really do not understand how this could be expected to work: in order to run in Spark the code needs to be bundled up and submitted via spark-submit.

所以我怀疑其他问题确实真正解决了通过Intellij提交pyspark代码来激发。

So I doubt that that other question actually truly addressed submitting pyspark code through Intellij to spark.

有没有办法将 pyspark 代码提交到 pyspark ?它实际上是

Is there a way to submit pyspark code to pyspark? It would actually be

  spark-submit myPysparkCode.py

pyspark 可执行文件本身已弃用,因为 Spark 1.0 。任何人都有这个工作吗?

The pyspark executable itself is deprecated since Spark 1.0. Anyone have this working?

推荐答案

在我的情况下,来自其他Q& A的变量设置在IntelliJ IDEA中编写和运行pyspark 涵盖 most 但不包括 all 所需的设置。我试过很多次了。

In my case the variable settings from the other Q&A Write and run pyspark in IntelliJ IDEA covered most but not all of the required settings. I tried them many times.

仅在添加:

  PYSPARK_SUBMIT_ARGS =  pyspark-shell

运行配置做了 pyspark 最后安静下来并成功。

to the run configuration did pyspark finally quiet down and succeed.

这篇关于在Intellij中运行pyspark代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆