如何设置 Apache Spark Executor 内存 [英] How to set Apache Spark Executor memory

查看:27
本文介绍了如何设置 Apache Spark Executor 内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何增加可用于 Apache Spark 执行器节点的内存?

How can I increase the memory available for Apache spark executor nodes?

我有一个适合加载到 Apache Spark 的 2 GB 文件.我目前在 1 台机器上运行 apache spark,所以驱动程序和执行程序在同一台机器上.该机器有 8 GB 的内存.

I have a 2 GB file that is suitable to loading in to Apache Spark. I am running apache spark for the moment on 1 machine, so the driver and executor are on the same machine. The machine has 8 GB of memory.

当我在将文件设置为缓存在内存中后尝试计算文件的行数时,出现以下错误:

When I try count the lines of the file after setting the file to be cached in memory I get these errors:

2014-10-25 22:25:12 WARN  CacheManager:71 - Not enough space to cache partition rdd_1_1 in memory! Free memory is 278099801 bytes.

我查看了文档 here 并设置了 spark.executor.memory4g$SPARK_HOME/conf/spark-defaults.conf

I looked at the documentation here and set spark.executor.memory to 4g in $SPARK_HOME/conf/spark-defaults.conf

UI 显示此变量是在 Spark 环境中设置的.您可以在此处

The UI shows this variable is set in the Spark Environment. You can find screenshot here

但是,当我转到 Executor 选项卡 时,内存我的单个 Executor 的限制仍然设置为 265.4 MB.我也仍然得到同样的错误.

However when I go to the Executor tab the memory limit for my single Executor is still set to 265.4 MB. I also still get the same error.

我尝试了这里提到的各种事情 但我仍然收到错误消息,并且不清楚应该在哪里更改设置.

I tried various things mentioned here but I still get the error and don't have a clear idea where I should change the setting.

我正在从 spark-shell 交互式地运行我的代码

I am running my code interactively from the spark-shell

推荐答案

由于您在本地模式下运行 Spark,因此设置 spark.executor.memory 不会有任何效果,正如您所注意到的.这样做的原因是 Worker 在您启动 spark-shell 时启动的驱动程序 JVM 进程中存在",并且用于该进程的默认内存为 512M.您可以通过将 spark.driver.memory 设置为更高的值来增加它,例如 5g.您可以通过以下任一方式执行此操作:

Since you are running Spark in local mode, setting spark.executor.memory won't have any effect, as you have noticed. The reason for this is that the Worker "lives" within the driver JVM process that you start when you start spark-shell and the default memory used for that is 512M. You can increase that by setting spark.driver.memory to something higher, for example 5g. You can do that by either:

  • 在属性文件中设置(默认为$SPARK_HOME/conf/spark-defaults.conf),

spark.driver.memory              5g

  • 或者通过在运行时提供配置设置

  • or by supplying configuration setting at runtime

    $ ./bin/spark-shell --driver-memory 5g
    

  • 请注意,这不能通过在应用程序中设置来实现,因为到那时已经太晚了,该进程已经启动了一些内存.

    Note that this cannot be achieved by setting it in the application, because it is already too late by then, the process has already started with some amount of memory.

    265.4 MB 的原因是 Spark 将 spark.storage.memoryFraction * spark.storage.safetyFraction 用于存储内存总量,默认情况下它们是 0.6 和 0.9.

    The reason for 265.4 MB is that Spark dedicates spark.storage.memoryFraction * spark.storage.safetyFraction to the total amount of storage memory and by default they are 0.6 and 0.9.

    512 MB * 0.6 * 0.9 ~ 265.4 MB
    

    因此请注意,并非所有驱动程序内存都可用于 RDD 存储.

    So be aware that not the whole amount of driver memory will be available for RDD storage.

    但是当您开始在集群上运行它时,spark.executor.memory 设置将在计算专用于 Spark 内存缓存的数量时接管.

    But when you'll start running this on a cluster, the spark.executor.memory setting will take over when calculating the amount to dedicate to Spark's memory cache.

    这篇关于如何设置 Apache Spark Executor 内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆