是什么火花提交和pyspark之间的区别? [英] What is the difference between spark-submit and pyspark?
问题描述
如果我启动pyspark,然后运行这个命令:
If I start up pyspark and then run this command:
import my_script; spark = my_script.Sparker(sc); spark.collapse('./data/')
一切都OK。但是,如果我试图通过命令行做同样的事情,引发提交,我得到一个错误:
Everything is A-ok. If, however, I try to do the same thing through the commandline and spark-submit, I get an error:
Command: /usr/local/spark/bin/spark-submit my_script.py collapse ./data/
File "/usr/local/spark/python/pyspark/rdd.py", line 352, in func
return f(iterator)
File "/usr/local/spark/python/pyspark/rdd.py", line 1576, in combineLocally
merger.mergeValues(iterator)
File "/usr/local/spark/python/pyspark/shuffle.py", line 245, in mergeValues
for k, v in iterator:
File "/.../my_script.py", line 173, in _json_args_to_arr
js = cls._json(line)
RuntimeError: uninitialized staticmethod object
my_script:
my_script:
...
if __name__ == "__main__":
args = sys.argv[1:]
if args[0] == 'collapse':
directory = args[1]
from pyspark import SparkContext
sc = SparkContext(appName="Collapse")
spark = Sparker(sc)
spark.collapse(directory)
sc.stop()
这是怎么回事?什么是运行pyspark运行火花提交,将导致这种分歧有什么区别?我怎样才能使这项工作的火花提交?
Why is this happening? What's the difference between running pyspark and running spark-submit that would cause this divergence? And how can I make this work in spark-submit?
编辑:我试着做 pyspark my_script.py崩溃./data /
从bash shell中运行此,我得到了同样的错误。当一切工作的唯一情况是,当我在Python shell并导入脚本。
I tried running this from the bash shell by doing pyspark my_script.py collapse ./data/
and I got the same error. The only time when everything works is when I am in a python shell and import the script.
推荐答案
pyspark提交发送您的code至工人执行集群。
pyspark-submit send your code to workers in a cluster to execute.
检查: http://spark.apache.org/docs/latest/提交 - applications.html
这篇关于是什么火花提交和pyspark之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!