Pyspark导入.py文件不起作用 [英] Pyspark import .py file not working
问题描述
我的目标是将自定义.py文件导入我的spark应用程序,并调用该文件中包含的某些功能
My goal is to import a custom .py file into my spark application and call some of the functions included inside that file
这是我尝试过的:
我有一个名为 Test.py 的测试文件,其外观如下:
I have a test file called Test.py which looks as follows:
def func():
print "Import is working"
在我的Spark应用程序内部,执行以下操作(如文档所述):
Inside my Spark application I do the following (as described in the docs):
sc = SparkContext(conf=conf, pyFiles=['/[AbsolutePathTo]/Test.py'])
我也尝试过此操作(在创建Spark上下文之后):
I also tried this instead (after the Spark context is created):
sc.addFile("/[AbsolutePathTo]/Test.py")
在提交我的spark应用程序时,我什至尝试了以下方法:
I even tried the following when submitting my spark application:
./bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0-M2 --py-files /[AbsolutePath]/Test.py ../Main/Code/app.py
但是,我总是会收到一个名称错误:
However, I always get a name error:
NameError: name 'func' is not defined
当我在 app.py 中调用 func() 时. (如果我尝试调用 Test.func() ,则与'Test'相同的错误)
when I am calling func() inside my app.py. (same error with 'Test' if I try to call Test.func())
最后,al还尝试使用与上述相同的命令将文件导入pyspark shell:
Finally, al also tried importing the file inside the pyspark shell with the same command as above:
sc.addFile("/[AbsolutePathTo]/Test.py")
奇怪的是,导入时没有出现错误,但是仍然无法调用func()而不得到错误.另外,不确定是否重要,但是我在一台机器上本地使用spark.
Strangely, I do not get an error on the import, but still, I cannot call func() without getting the error. Also, not sure if it matters, but I'm using spark locally on one machine.
我真的尽了我所能想到的一切,但仍然无法使它发挥作用.可能我缺少一些非常简单的东西.任何帮助将不胜感激.
I really tried everything I could think of, but still cannot get it to work. Probably I am missing something very simple. Any help would be appreciated.
推荐答案
好的,实际上我的问题很愚蠢.完成之后:
Alright, actually my question is rather stupid. After doing:
sc.addFile("/[AbsolutePathTo]/Test.py")
我仍然必须导入Test.py文件,就像使用以下命令导入常规python文件一样:
I still have to import the Test.py file like I would import a regular python file with:
import Test
然后我可以打电话
Test.func()
,并且有效.我认为导入测试"不是必需的,因为我已将文件添加到spark上下文中,但是显然这并没有相同的效果. 感谢mark91为我指出正确的方向.
and it works. I thought that the "import Test" is not necessary since I add the file to the spark context, but apparently that does not have the same effect. Thanks mark91 for pointing me into the right direction.
2017年10月28日更新:
UPDATE 28.10.2017:
如评论中所述,此处是有关app.py的更多详细信息
as asked in the comments, here more details on the app.py
from pyspark import SparkContext
from pyspark.conf import SparkConf
conf = SparkConf()
conf.setMaster("local[4]")
conf.setAppName("Spark Stream")
sc = SparkContext(conf=conf)
sc.addFile("Test.py")
import Test
Test.func()
这篇关于Pyspark导入.py文件不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!