spark-submit 和 pyspark 有什么区别? [英] What is the difference between spark-submit and pyspark?

查看：89 发布时间：2021/11/12 5:44:19 python apache-spark pyspark

本文介绍了spark-submit 和 pyspark 有什么区别?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果我启动 pyspark 然后运行这个命令:

import my_script;spark = my_script.Sparker(sc);spark.collapse('./data/')

一切正常.但是，如果我尝试通过命令行和 spark-submit 执行相同的操作，则会出现错误:

命令:/usr/local/spark/bin/spark-submit my_script.py collapse ./data/文件/usr/local/spark/python/pyspark/rdd.py"，第 352 行，在 func返回 f(迭代器)文件/usr/local/spark/python/pyspark/rdd.py"，第 1576 行，在 combineLocally 中merge.mergeValues(迭代器)文件/usr/local/spark/python/pyspark/shuffle.py"，第 245 行，在 mergeValues 中对于 k, v 在迭代器中:文件/.../my_script.py"，第 173 行，在 _json_args_to_arr 中js = cls._json(行)运行时错误:未初始化的静态方法对象

我的脚本:

<预><代码>...如果 __name__ == "__main__":args = sys.argv[1:]如果 args[0] == '折叠':目录 = args[1]从 pyspark 导入 SparkContextsc = SparkContext(appName="折叠")火花=火花(sc)spark.collapse(目录)sc.stop()

为什么会这样?运行 pyspark 和运行 spark-submit 会导致这种分歧的区别是什么?我怎样才能在 spark-submit 中完成这项工作?

我尝试通过执行 pyspark my_script.py collapse ./data/ 从 bash shell 运行它，但我遇到了同样的错误.一切正常的唯一时间是当我在 python shell 中并导入脚本时.

解决方案

如果你构建了一个spark应用程序，你需要使用spark-submit来运行应用程序
- 代码可以用python/scala编写
- 模式可以是本地/集群
如果你只是想测试/运行几个单独的命令，你可以使用spark提供的shell
- pyspark(用于 Python 中的 spark)
- spark-shell(用于 scala 中的 spark)

If I start up pyspark and then run this command:

import my_script; spark = my_script.Sparker(sc); spark.collapse('./data/')

Everything is A-ok. If, however, I try to do the same thing through the commandline and spark-submit, I get an error:

Command: /usr/local/spark/bin/spark-submit my_script.py collapse ./data/
  File "/usr/local/spark/python/pyspark/rdd.py", line 352, in func
    return f(iterator)
  File "/usr/local/spark/python/pyspark/rdd.py", line 1576, in combineLocally
    merger.mergeValues(iterator)
  File "/usr/local/spark/python/pyspark/shuffle.py", line 245, in mergeValues
    for k, v in iterator:
  File "/.../my_script.py", line 173, in _json_args_to_arr
    js = cls._json(line)
RuntimeError: uninitialized staticmethod object

my_script:

...
if __name__ == "__main__":
    args = sys.argv[1:]
    if args[0] == 'collapse':
        directory = args[1]
        from pyspark import SparkContext
        sc = SparkContext(appName="Collapse")
        spark = Sparker(sc)
        spark.collapse(directory)
        sc.stop()

Why is this happening? What's the difference between running pyspark and running spark-submit that would cause this divergence? And how can I make this work in spark-submit?

EDIT: I tried running this from the bash shell by doing pyspark my_script.py collapse ./data/ and I got the same error. The only time when everything works is when I am in a python shell and import the script.

解决方案

If you built a spark application, you need to use spark-submit to run the application
- The code can be written either in python/scala
- The mode can be either local/cluster
If you just want to test/run few individual commands, you can use the shell provided by spark
- pyspark (for spark in python)
- spark-shell (for spark in scala)

这篇关于spark-submit 和 pyspark 有什么区别?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

spark-submit 和 pyspark 有什么区别? [英] What is the difference between spark-submit and pyspark?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

spark-submit 和 pyspark 有什么区别? [英] What is the difference between spark-submit and pyspark?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭