Spark Java:无法更改驱动程序内存 [英] Spark Java: Cannot change driver memory

查看:80
本文介绍了Spark Java:无法更改驱动程序内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,我有一个具有 16 个工作节点和一个主节点的 Spark 独立集群.我用sh start-all.sh"启动集群.来自 spark_home/conf 文件夹中主节点的命令.主节点有 32Gb 内存和 14 个 VCPUS,而我每个节点有 16Gb 内存和 8 个 VCPUS.我还有一个 spring 应用程序,当它启动时(使用 java -jar app.jar),它会初始化 spark 上下文.spark-env.sh 文件是:

export SPARK_MASTER_HOST='192.168.100.17'导出 SPARK_WORKER_CORES=1导出 SPARK_WORKER_MEMORY=14000mb出口 SPARK_WORKER_INSTANCES=1export SPARK_WORKER_OPTS='-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=172800 -Dspark.worker.cleanup.appDataTtl=172800'

我在 spark-defaults.conf 中没有任何内容,以编程方式初始化 spark 上下文的代码是:

@Bean公共 SparkSession sparksession() {SparkSession sp = SparkSession.builder().master("spark://....").config("spark.cassandra.connection.host","192.168.100......").appName("biomet").config(spark.driver.memory",20g").config("spark.driver.maxResultSize", "10g").config("spark.sql.shuffle.partitions",48).config(spark.executor.memory",7g").config("spark.sql.pivotMaxValues","50000").config("spark.sql.caseSensitive",true).config("spark.executor.extraClassPath","/home/ubuntu/spark-2.4.3-bin-hadoop2.7/jars/guava-16.0.1.jar").config("spark.hadoop.fs.s3a.access.key","...").config(spark.hadoop.fs.s3a.secret.key",...").getOrCreate();返回 sp;}

毕竟,Spark UI 的 Environment 选项卡具有 spark.driver.maxResultSize 10g 和 spark.driver.memory 20g 但驱动程序存储内存的 executors 选项卡显示 0.0 B/4.3 GB.

(仅供参考:我曾经有 10g 的 spark.driver.memory(以编程方式设置),并且在执行器选项卡中说的是 4.3Gb,但现在看来我无法更改它.但我在想,即使当我有 10g,不是应该给我超过 4.3Gb 的吗?!)

如何更改驱动程序内存?我尝试从 spark-defaults.conf 设置它,但没有任何改变.即使我根本没有设置驱动程序内存(或将其设置为小于 4.3Gb),它仍然在执行程序选项卡中显示 4.3Gb.

解决方案

我怀疑您正在以客户端模式运行您的应用程序,然后 每个文档:

<块引用>

可以使用 spark 设置最大堆大小设置.司机.集群模式下的内存和客户端模式下的 --driver-memory 命令行选项.注意:在客户端模式下,此配置不能直接在您的应用程序中通过 SparkConf 设置,因为驱动程序 JVM 已经在此时启动.

在当前情况下,Spark 作业是从应用程序提交的,因此应用程序本身是一个驱动程序,它的内存按照 Java 应用程序的常规方式进行调节 - 通过 -Xmx 等.>

So, I have a spark standalone cluster with 16 worker nodes and one master node. I start the cluster with "sh start-all.sh" command from the master node in spark_home/conf folder. The master node has 32Gb Ram and 14 VCPUS, while I have 16Gb Ram and 8 VCPUS per node. I also have a spring application which, when it starts(with java -jar app.jar), it initializes the spark context. The spark-env.sh file is:

export SPARK_MASTER_HOST='192.168.100.17'
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=14000mb 
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_OPTS='-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=172800 -Dspark.worker.cleanup.appDataTtl=172800'

I do not have anything in spark-defaults.conf and the code for initializing the spark context programmatically is:

@Bean
public SparkSession sparksession() {
     SparkSession sp = SparkSession
             .builder()
    .master("spark://....")
    .config("spark.cassandra.connection.host","192.168.100......")
    .appName("biomet")
    .config("spark.driver.memory","20g")
    .config("spark.driver.maxResultSize", "10g")
    .config("spark.sql.shuffle.partitions",48) 
    .config("spark.executor.memory","7g") 
    .config("spark.sql.pivotMaxValues","50000") 
    .config("spark.sql.caseSensitive",true)
    .config("spark.executor.extraClassPath","/home/ubuntu/spark-2.4.3-bin-hadoop2.7/jars/guava-16.0.1.jar")
    .config("spark.hadoop.fs.s3a.access.key","...")
    .config("spark.hadoop.fs.s3a.secret.key","...")
             .getOrCreate();
     return sp;
 }

After all this the Environment tab of the Spark UI has spark.driver.maxResultSize 10g and spark.driver.memory 20g BUT the executors tab for the storage memory of the driver says 0.0 B / 4.3 GB.

(FYI: I used to have spark.driver.memory at 10g(programmatically set), and in the executor tab was saying 4.3Gb, but now it seems I cannot change it. But I am thinking that even if when I had it 10g, wasn't it suppose to give me more than 4.3Gb?!)

How can I change the driver memory? I tried setting it from spark-defaults.conf but nothing changed. Even if I do not set at all the driver memory(or set it to smaller than 4.3Gb) it still says 4.3Gb in executors tab.

解决方案

I suspect that you're running your application in the client mode, then per documentation:

Maximum heap size settings can be set with spark. driver. memory in the cluster mode and through the --driver-memory command line option in the client mode. Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point.

In current case, the Spark job is submitted from the application, so the application itself is a driver, and its memory is regulated as usual for Java applications - via -Xmx, etc.

这篇关于Spark Java:无法更改驱动程序内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆