如何知道 PySpark 应用程序的部署模式? [英] How to know deploy mode of PySpark application?

查看:29
本文介绍了如何知道 PySpark 应用程序的部署模式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解决内存不足的问题,我想知道是否需要更改 spark 中默认配置文件 (spark-defaults.conf) 中的这些设置主文件夹.或者,如果我可以在代码中设置它们.

I am trying to fix an issue with running out of memory, and I want to know whether I need to change these settings in the default configurations file (spark-defaults.conf) in the spark home folder. Or, if I can set them in the code.

我看到了这个问题 PySpark: java.lang.OutofMemoryError: Java heap space 它说这取决于我是否在 client 模式下运行.我在集群上运行 spark 并使用独立监控它.

I saw this question PySpark: java.lang.OutofMemoryError: Java heap space and it says that it depends on if I'm running in client mode. I'm running spark on a cluster and monitoring it using standalone.

但是,我如何确定我是否在 client 模式下运行 spark?

But, how do I figure out if I'm running spark in client mode?

推荐答案

如果您正在运行交互式 shell,例如pyspark(CLI 或通过 IPython 笔记本),默认情况下您在 client 模式下运行.您可以轻松验证您不能cluster模式下运行pyspark或任何其他交互式shell:

If you are running an interactive shell, e.g. pyspark (CLI or via an IPython notebook), by default you are running in client mode. You can easily verify that you cannot run pyspark or any other interactive shell in cluster mode:

$ pyspark --master yarn --deploy-mode cluster
Python 2.7.11 (default, Mar 22 2016, 01:42:54)
[GCC Intel(R) C++ gcc 4.8 mode] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Error: Cluster deploy mode is not applicable to Spark shells.

$ spark-shell --master yarn --deploy-mode cluster
Error: Cluster deploy mode is not applicable to Spark shells.

检查 bin/pyspark 文件的内容也可能很有启发性 - 这是最后一行(这是实际的可执行文件):

Examining the contents of the bin/pyspark file may be instructive, too - here is the final line (which is the actual executable):

$ pwd
/home/ctsats/spark-1.6.1-bin-hadoop2.6
$ cat bin/pyspark
[...]
exec "${SPARK_HOME}"/bin/spark-submit pyspark-shell-main --name "PySparkShell" "$@"

pyspark 实际上是一个由 spark-submit 运行的脚本,命名为 PySparkShell,通过它你可以在 Spark History Server UI 中找到它;并且由于它是这样运行的,它会根据其 spark-submit 命令中包含的任何参数(或默认值)运行.

i.e. pyspark is actually a script run by spark-submit and given the name PySparkShell, by which you can find it in the Spark History Server UI; and since it is run like that, it goes by whatever arguments (or defaults) are included with its spark-submit command.

这篇关于如何知道 PySpark 应用程序的部署模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆