Pyspark导入.py文件不起作用 [英] Pyspark import .py file not working

查看:340
本文介绍了Pyspark导入.py文件不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是将自定义.py文件导入我的spark应用程序,并调用该文件中包含的某些功能

My goal is to import a custom .py file into my spark application and call some of the functions included inside that file

这是我尝试过的:

我有一个名为 Test.py 的测试文件,其外观如下:

I have a test file called Test.py which looks as follows:

def func():
    print "Import is working"

在我的Spark应用程序内部,执行以下操作(如文档所述):

Inside my Spark application I do the following (as described in the docs):

sc = SparkContext(conf=conf, pyFiles=['/[AbsolutePathTo]/Test.py'])

我也尝试过此操作(在创建Spark上下文之后):

I also tried this instead (after the Spark context is created):

sc.addFile("/[AbsolutePathTo]/Test.py")

在提交我的spark应用程序时,我什至尝试了以下方法:

I even tried the following when submitting my spark application:

./bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0-M2 --py-files /[AbsolutePath]/Test.py ../Main/Code/app.py

但是,我总是会收到一个名称错误:

However, I always get a name error:

NameError: name 'func' is not defined

当我在 app.py 中调用 func() 时. (如果我尝试调用 Test.func() ,则与'Test'相同的错误)

when I am calling func() inside my app.py. (same error with 'Test' if I try to call Test.func())

最后,al还尝试使用与上述相同的命令将文件导入pyspark shell:

Finally, al also tried importing the file inside the pyspark shell with the same command as above:

sc.addFile("/[AbsolutePathTo]/Test.py")

奇怪的是,导入时没有出现错误,但是仍然无法调用func()而不得到错误.另外,不确定是否重要,但是我在一台机器上本地使用spark.

Strangely, I do not get an error on the import, but still, I cannot call func() without getting the error. Also, not sure if it matters, but I'm using spark locally on one machine.

我真的尽了我所能想到的一切,但仍然无法使它发挥作用.可能我缺少一些非常简单的东西.任何帮助将不胜感激.

I really tried everything I could think of, but still cannot get it to work. Probably I am missing something very simple. Any help would be appreciated.

推荐答案

好的,实际上我的问题很愚蠢.完成之后:

Alright, actually my question is rather stupid. After doing:

sc.addFile("/[AbsolutePathTo]/Test.py")

我仍然必须导入Test.py文件,就像使用以下命令导入常规python文件一样:

I still have to import the Test.py file like I would import a regular python file with:

import Test

然后我可以打电话

Test.func()

,并且有效.我认为导入测试"不是必需的,因为我已将文件添加到spark上下文中,但是显然这并没有相同的效果. 感谢mark91为我指出正确的方向.

and it works. I thought that the "import Test" is not necessary since I add the file to the spark context, but apparently that does not have the same effect. Thanks mark91 for pointing me into the right direction.

2017年10月28日更新:

UPDATE 28.10.2017:

如评论中所述,此处是有关app.py的更多详细信息

as asked in the comments, here more details on the app.py

from pyspark import SparkContext
from pyspark.conf import SparkConf

conf = SparkConf()
conf.setMaster("local[4]")
conf.setAppName("Spark Stream")
sc = SparkContext(conf=conf)
sc.addFile("Test.py")

import Test

Test.func()

这篇关于Pyspark导入.py文件不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆