如何通过Sparklyr在本地模式下运行Spark时配置驱动程序内存？ [英] How do I configure driver memory when running Spark in local mode via Sparklyr?

查看：712 发布时间：2019/1/2 10:49:09 java r apache-spark sparklyr

本文介绍了如何通过Sparklyr在本地模式下运行Spark时配置驱动程序内存？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Sparklyr在具有244GB RAM的虚拟机上以本地模式运行Spark应用程序。在我的代码中，我使用 spark_read_csv（）从一个文件夹读取~50MB的csvs，然后从第二个文件夹读取~1.5GB的csvs。我的问题是，当尝试读取第二个文件夹时，应用程序会抛出错误。

I am using Sparklyr to run a Spark application in local mode on a virtual machine with 244GB of RAM. In my code I use spark_read_csv() to read in ~50MB of csvs from one folder and then ~1.5GB of csvs from a second folder. My issue is that the application throws an error when trying to read in the second folder.

据我所知，问题是驱动程序JVM可用的默认RAM是512MB - 对于第二个文件夹来说太小了（在本地模式下，所有操作都在驱动程序JVM中运行，如此处所述
如何设置Apache Spark Executor内存。所以我需要将 spark.driver.memory 参数增加到某个东西更大。

As I understand it, the issue is that the default RAM available to the driver JVM is 512MB - too small for the second folder (in local mode all operations are run within the driver JVM, as described here How to set Apache Spark Executor memory. So I need to increase the spark.driver.memory parameter to something larger.

问题是我无法通过 sparklyr文档（即通过 spark_config（）， config.yml 文件，或 spark-defaults.conf 文件）：

The issue is that I cannot set this parameter through the normal methods described in the sparklyr documentation (i.e. via spark_config(), the config.yml file, or the spark-defaults.conf file):

in本地模式，当您运行spark-submit时，JVM已经启动使用默认的内存设置，因此在conf中设置spark.driver.memory实际上并不会为您做任何事情。相反，您需要按如下方式运行spark-submit：

in local mode, by the time you run spark-submit, a JVM has already been launched with the default memory settings, so setting "spark.driver.memory" in your conf won't actually do anything for you. Instead, you need to run spark-submit as follows:

bin/spark-submit --driver-memory 2g --class your.class.here app.jar

（来自如何设置Apache Spark Executor内存）。

我以为我可以通过添加 sparklyr.shell.driver-memory <复制上面的 bin / spark-submit 命令/ code> config.yml 的选项;如Sparklyr文档中所述; sparklyr.shell * 选项是传递给 spark-submit 的命令行参数，即添加 sparklyr.shell.driver-memory：5G 到 config.yml 文件应相当于运行 bin / spark -submit --driver-memory 5G 。


I thought I could replicate the bin/spark-submit command above by adding the sparklyr.shell.driver-memory option to the config.yml; as stated in the Sparklyr documentation; sparklyr.shell* options are command line parameters that get passed to spark-submit, i.e. adding sparklyr.shell.driver-memory: 5G to the config.yml file should be equivalent to running bin/spark-submit --driver-memory 5G.
我现在尝试了以上所有选项，但没有一个改变Spark中的驱动程序内存应用程序（我通过查看Spark UI的'Executors'选项卡进行检查）。
I have now tried all of the above options and none of them change driver memory in the Spark application (which I check by looking at the 'Executors' tab of the Spark UI).
那么如何通过Sparklyr在本地模式下运行Spark时更改驱动程序内存？ 
So how can I change driver memory when running Spark in local mode via Sparklyr?
推荐答案
感谢@Aydin K的建议。最终我能够通过首先将java更新为64位来配置驱动程序内存（允许使用） > JVM中的> 4G RAM），然后使用 spark_config（）对象中的 sparklyr.shell * 参数：
Thanks for the suggestions @Aydin K. Ultimately I was able to configure driver memory by first updating java to 64bit (allows utilisation of >4G of RAM in the JVMs), then using the sparklyr.shell* parameters within the spark_config() object:
config <- spark_config()
config$`sparklyr.shell.driver-memory` <- '30G'
config$`sparklyr.shell.executor-memory` <- '30G'
sc <- spark_connect(master='local', version='2.0.1', config=config)


                        这篇关于如何通过Sparklyr在本地模式下运行Spark时配置驱动程序内存？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何通过Sparklyr在本地模式下运行Spark时配置驱动程序内存？ [英] How do I configure driver memory when running Spark in local mode via Sparklyr?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何通过Sparklyr在本地模式下运行Spark时配置驱动程序内存？ [英] How do I configure driver memory when running Spark in local mode via Sparklyr?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭