如何设置的Apache星火执行器存储器 [英] How to set Apache Spark Executor memory

查看:214
本文介绍了如何设置的Apache星火执行器存储器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我怎样才能提高Apache的火花执行人节点的可用内存?

How can I increase the memory available for Apache spark executor nodes?

我有一个2 GB的文件,适合装载在到Apache的火花。我运行1台机器上的那一刻apache的火花,使驾驶者和执行者在同一台机器上。本机具有8 GB的内存。

I have a 2 GB file that is suitable to loading in to Apache Spark. I am running apache spark for the moment on 1 machine, so the driver and executor are on the same machine. The machine has 8 GB of memory.

当我尝试设置文件后,在内存中缓存计数文件的行我得到这些错误:

When I try count the lines of the file after setting the file to be cached in memory I get these errors:

2014-10-25 22:25:12 WARN  CacheManager:71 - Not enough space to cache partition rdd_1_1 in memory! Free memory is 278099801 bytes.

我看着这里文档并设置 spark.executor.memory 4克 $ SPARK_HOME / conf目录/火花defaults.conf

用户界面显示了这个变量是在星火环境设置。你可以找到截图这里

The UI shows this variable is set in the Spark Environment. You can find screenshot here

然而,当我去执行人标签内存为我的单执行人限制仍设置为265.4 MB。我也仍然得到同样的错误。

However when I go to the Executor tab the memory limit for my single Executor is still set to 265.4 MB. I also still get the same error.

我想提到的各种事情<一href=\"http://stackoverflow.com/questions/24242060/how-to-change-memory-per-node-for-apache-spark-worker\">here但我仍然得到错误,并没有一个明确的想法,我应该更改设置。

I tried various things mentioned here but I still get the error and don't have a clear idea where I should change the setting.

我从火花壳

推荐答案

既然你正在运行在本地模式星火,设置 spark.executor.memory 将不会有任何效果,因为你已经注意到。这样做的原因是,您在开始启动工作者生活司机JVM进程中使用的火花壳并默认内存就是 512M 。您可以增加通过设置 spark.driver.memory 来更高的东西,例如5克。你可以做到这一点有两种方法:

Since you are running Spark in local mode, setting spark.executor.memory won't have any effect, as you have noticed. The reason for this is that the Worker "lives" within the driver JVM process that you start when you start spark-shell and the default memory used for that is 512M. You can increase that by setting spark.driver.memory to something higher, for example 5g. You can do that by either:


  • 设置它的属性文件(默认值是火花defaults.conf)

  • setting it in the properties file (default is spark-defaults.conf),

spark.driver.memory              5g


  • 或通过提供在运行时配置设置

  • or by supplying configuration setting at runtime

    $ ./bin/spark-shell --driver-memory 5g
    


  • 请注意,这不能在该应用程序设置来实现,因为它已经太晚了,届时,这个过程已经开始与一些内存量。

    Note that this cannot be achieved by setting it in the application, because it is already too late by then, the process has already started with some amount of memory.

    原因 265.4 MB 是星火<一个href=\"https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1152\">dedicates spark.storage.memoryFraction * spark.storage.safetyFraction 来存储内存的总量,默认情况下它们是0.6和0.9。

    The reason for 265.4 MB is that Spark dedicates spark.storage.memoryFraction * spark.storage.safetyFraction to the total amount of storage memory and by default they are 0.6 and 0.9.

    512 MB * 0.6 * 0.9 ~ 265.4 MB
    

    所以,要知道,没有驾驶员记忆的全部金额将可用于存储RDD

    So be aware that not the whole amount of driver memory will be available for RDD storage.

    但是,当你启动一个集群上运行这一点, spark.executor.memory 设置会计算的金额时,奉献给星火的内存缓存接管。

    But when you'll start running this on a cluster, the spark.executor.memory setting will take over when calculating the amount to dedicate to Spark's memory cache.

    这篇关于如何设置的Apache星火执行器存储器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆