在Spark中为执行者和任务分配内存 [英] Memory allocation to executors and task in Spark

查看:131
本文介绍了在Spark中为执行者和任务分配内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的集群配置如下:- 7个节点,每个节点具有32个内核和252 GB的内存.

纱线的结构如下:-

yarn.scheduler.maximum-allocation-mb - 10GB
yarn.scheduler.minimum-allocation-mb - 2GB
yarn.nodemanager.vmem-pmem-ratio - 2.1
yarn.nodemanager.resource.memory-mb - 22GB
yarn.scheduler.maximum-allocation-vcores - 25
yarn.scheduler.minimum-allocation-vcores - 1
yarn.nodemanager.resource.cpu-vcores - 25

map reduce配置如下:-

mapreduce.map.java.opts - -Xmx1638m
mapreduce.map.memory.mb - 2GB
mapreduce.reduce.java.opts - -Xmx3276m
mapreduce.reduce.memory.mb - 4Gb

火花配置为:-

spark.yarn.driver.memoryOverhead 384
spark.yarn.executor.memoryOverhead 384

现在,我尝试通过将值设置为主纱线并为执行程序内存,num执行程序,执行程序核心设置不同的值来运行spark-shell.

  1. spark-shell --master yarn --executor-memory 9856M --num-executors 175 --executor-cores 1

在这种情况下,执行器的内存+ 384不能超过纱线调度器的最大10GB.因此,在这种情况下,9856M + 384 MB = 10GB,可以正常工作.现在,一旦启动了spark shell,执行者的总数就为124,而不是175.执行者在spark shell启动日志或Spark UI中看到的每个执行者的存储内存为6.7 GB(即10GB的67%).

spark shell进程的顶部命令输出如下:-

PID     USER      PR    NI  VIRT  RES   SHR S  %CPU %MEM  TIME+  
8478    hdp66-ss  20    0   13.5g 1.1g  25m S  1.9  0.4   2:11.28

因此,虚拟内存为13.5G,物理内存为1.1g

  1. spark-shell --master yarn --executor-memory 9856M --num-executors 35 --executor-cores 5

在这种情况下,执行器的内存+ 384不能超过纱线调度器的最大10GB.因此,在这种情况下,9856M + 384 MB = 10GB,可以正常工作.现在,一旦Spark Shell启动,执行程序的总数为35.在Spark Shell启动日志或Spark UI中看到的每个执行程序的存储内存为6.7 GB(即10GB的67%).

spark shell进程的顶部命令输出如下:-

PID     USER      PR    NI  VIRT  RES   SHR S  %CPU %MEM  TIME+  
5256    hdp66-ss  20    0   13.2g 1.1g  25m S  2.6  0.4   1:25.25

因此,虚拟内存为13.2G,物理内存为1.1g

  1. spark-shell --master yarn --executor-memory 4096M --num-executors 200 --executor-cores 1

在这种情况下,执行器的内存+ 384不能超过纱线调度器的最大10GB.因此,在这种情况下,4096M + 384 MB = 4GB,因此可以正常工作.现在,一旦Spark Shell启动,执行程序的总数便为200.在Spark Shell启动日志或Spark UI中看到的每个执行程序的存储内存为2.7 GB(即4GB的67%).

spark shell进程的顶部命令输出如下:-

PID     USER      PR    NI  VIRT  RES   SHR S  %CPU %MEM  TIME+  
21518   hdp66-ss  20    0   19.2g 1.4g  25m S  3.9  0.6   2:24.46

因此,虚拟内存为19.2G,物理内存为1.4g.

所以有人可以向我解释这些记忆和执行者是如何开始的.为什么在Spark UI上看到的内存占执行器内存的67%?以及如何确定每个执行者的虚拟和物理内存.

解决方案

Spark几乎总是分配用户为执行者请求的内存的65%到70%. Spark的这种行为归因于SPARK JIRA TICKET "SPARK-12579" .

此链接指向位于Apache Spark存储库中的scala文件,该文件用于计算执行程序的内存等.

    if (conf.contains("spark.executor.memory")) {
  val executorMemory = conf.getSizeAsBytes("spark.executor.memory")
  if (executorMemory < minSystemMemory) {
    throw new IllegalArgumentException(s"Executor memory $executorMemory must be at least " +
      s"$minSystemMemory. Please increase executor memory using the " +
      s"--executor-memory option or spark.executor.memory in Spark configuration.")
  }
}
val usableMemory = systemMemory - reservedMemory
val memoryFraction = conf.getDouble("spark.memory.fraction", 0.6)
(usableMemory * memoryFraction).toLong

}

上面的代码负责您所看到的行为.对于群集可能没有用户请求的内存的情况,这是一个安全的保护措施.

My cluster configuration is as below :- 7 Nodes each with 32 cores and 252 GB of memory.

The yarn configurations are as below :-

yarn.scheduler.maximum-allocation-mb - 10GB
yarn.scheduler.minimum-allocation-mb - 2GB
yarn.nodemanager.vmem-pmem-ratio - 2.1
yarn.nodemanager.resource.memory-mb - 22GB
yarn.scheduler.maximum-allocation-vcores - 25
yarn.scheduler.minimum-allocation-vcores - 1
yarn.nodemanager.resource.cpu-vcores - 25

The map reduce configurations are as below :-

mapreduce.map.java.opts - -Xmx1638m
mapreduce.map.memory.mb - 2GB
mapreduce.reduce.java.opts - -Xmx3276m
mapreduce.reduce.memory.mb - 4Gb

The spark configurations are as :-

spark.yarn.driver.memoryOverhead 384
spark.yarn.executor.memoryOverhead 384

Now I tried running the spark-shell by setting values as master yarn and different values for executor-memory, num-executors, executor-cores.

  1. spark-shell --master yarn --executor-memory 9856M --num-executors 175 --executor-cores 1

In this case the executor memory + 384 cannot exceed 10GB max for a yarn scheduler. So in this case 9856M + 384 MB = 10GB so it works fine. Now once the spark shell is up, the total number of executors were 124 instead of requtesed 175. The Storage memory as seen in spark shell start logs or Spark UI for each executor is 6.7 GB(i.e. 67% of 10GB).

The top command output for the spark shell process is as below:-

PID     USER      PR    NI  VIRT  RES   SHR S  %CPU %MEM  TIME+  
8478    hdp66-ss  20    0   13.5g 1.1g  25m S  1.9  0.4   2:11.28

So virtual memory is 13.5G and physical memory is 1.1g

  1. spark-shell --master yarn --executor-memory 9856M --num-executors 35 --executor-cores 5

In this case the executor memory + 384 cannot exceed 10GB max for a yarn scheduler. So in this case 9856M + 384 MB = 10GB so it works fine. Now once the spark shell is up, the total number of executors were 35. The Storage memory as seen in spark shell start logs or Spark UI for each executor is 6.7 GB(i.e. 67% of 10GB).

The top command output for the spark shell process is as below:-

PID     USER      PR    NI  VIRT  RES   SHR S  %CPU %MEM  TIME+  
5256    hdp66-ss  20    0   13.2g 1.1g  25m S  2.6  0.4   1:25.25

So virtual memory is 13.2G and physical memory is 1.1g

  1. spark-shell --master yarn --executor-memory 4096M --num-executors 200 --executor-cores 1

In this case the executor memory + 384 cannot exceed 10GB max for a yarn scheduler. So in this case 4096M + 384 MB = 4GB so it works fine. Now once the spark shell is up, the total number of executors were 200. The Storage memory as seen in spark shell start logs or Spark UI for each executor is 2.7 GB(i.e. 67% of 4GB).

The top command output for the spark shell process is as below:-

PID     USER      PR    NI  VIRT  RES   SHR S  %CPU %MEM  TIME+  
21518   hdp66-ss  20    0   19.2g 1.4g  25m S  3.9  0.6   2:24.46

So virtual memory is 19.2G and physical memory is 1.4g.

So can someone please explain me how these memories and executors are started. Why the memory seen on spark UI is 67% of the executor memory requetsed? And how the virtual and physical memory is decided for each executor.

解决方案

Spark almost always allocates 65% to 70% of the memory requested for the executors by a user. This behavior of Spark is due to a SPARK JIRA TICKET "SPARK-12579".

This link is to the scala file located in the Apache Spark Repository that is used to calculate the executor memory among other things.

    if (conf.contains("spark.executor.memory")) {
  val executorMemory = conf.getSizeAsBytes("spark.executor.memory")
  if (executorMemory < minSystemMemory) {
    throw new IllegalArgumentException(s"Executor memory $executorMemory must be at least " +
      s"$minSystemMemory. Please increase executor memory using the " +
      s"--executor-memory option or spark.executor.memory in Spark configuration.")
  }
}
val usableMemory = systemMemory - reservedMemory
val memoryFraction = conf.getDouble("spark.memory.fraction", 0.6)
(usableMemory * memoryFraction).toLong

}

The above code is responsible for the behavior seen by you. This is a safe guard for a scenario where the cluster may not have memory as requested by the user.

这篇关于在Spark中为执行者和任务分配内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆