如何设置的Apache星火执行器存储器 [英] How to set Apache Spark Executor memory
问题描述
我怎样才能提高Apache的火花执行人节点的可用内存?
How can I increase the memory available for Apache spark executor nodes?
我有一个2 GB的文件,适合装载在到Apache的火花。我运行1台机器上的那一刻apache的火花,使驾驶者和执行者在同一台机器上。本机具有8 GB的内存。
I have a 2 GB file that is suitable to loading in to Apache Spark. I am running apache spark for the moment on 1 machine, so the driver and executor are on the same machine. The machine has 8 GB of memory.
当我尝试设置文件后,在内存中缓存计数文件的行我得到这些错误:
When I try count the lines of the file after setting the file to be cached in memory I get these errors:
2014-10-25 22:25:12 WARN CacheManager:71 - Not enough space to cache partition rdd_1_1 in memory! Free memory is 278099801 bytes.
我看着这里文档并设置 spark.executor.memory
到4克
在 $ SPARK_HOME / conf目录/火花defaults.conf
用户界面显示了这个变量是在星火环境设置。你可以找到截图这里
The UI shows this variable is set in the Spark Environment. You can find screenshot here
然而,当我去执行人标签内存为我的单执行人限制仍设置为265.4 MB。我也仍然得到同样的错误。
However when I go to the Executor tab the memory limit for my single Executor is still set to 265.4 MB. I also still get the same error.
我想提到的各种事情<一href=\"http://stackoverflow.com/questions/24242060/how-to-change-memory-per-node-for-apache-spark-worker\">here但我仍然得到错误,并没有一个明确的想法,我应该更改设置。
I tried various things mentioned here but I still get the error and don't have a clear idea where I should change the setting.
我从火花壳
推荐答案
既然你正在运行在本地模式星火,设置 spark.executor.memory
将不会有任何效果,因为你已经注意到。这样做的原因是,您在开始启动工作者生活司机JVM进程中使用的火花壳并默认内存就是 512M 。您可以增加通过设置 spark.driver.memory
来更高的东西,例如5克。你可以做到这一点有两种方法:
Since you are running Spark in local mode, setting spark.executor.memory
won't have any effect, as you have noticed. The reason for this is that the Worker "lives" within the driver JVM process that you start when you start spark-shell and the default memory used for that is 512M. You can increase that by setting spark.driver.memory
to something higher, for example 5g. You can do that by either:
-
设置它的属性文件(默认值是火花defaults.conf)
setting it in the properties file (default is spark-defaults.conf),
spark.driver.memory 5g
或通过提供在运行时配置设置
or by supplying configuration setting at runtime
$ ./bin/spark-shell --driver-memory 5g
请注意,这不能在该应用程序设置来实现,因为它已经太晚了,届时,这个过程已经开始与一些内存量。
Note that this cannot be achieved by setting it in the application, because it is already too late by then, the process has already started with some amount of memory.
原因 265.4 MB 是星火<一个href=\"https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1152\">dedicates spark.storage.memoryFraction * spark.storage.safetyFraction 来存储内存的总量,默认情况下它们是0.6和0.9。
The reason for 265.4 MB is that Spark dedicates spark.storage.memoryFraction * spark.storage.safetyFraction to the total amount of storage memory and by default they are 0.6 and 0.9.
512 MB * 0.6 * 0.9 ~ 265.4 MB
所以,要知道,没有驾驶员记忆的全部金额将可用于存储RDD
So be aware that not the whole amount of driver memory will be available for RDD storage.
但是,当你启动一个集群上运行这一点, spark.executor.memory
设置会计算的金额时,奉献给星火的内存缓存接管。
But when you'll start running this on a cluster, the spark.executor.memory
setting will take over when calculating the amount to dedicate to Spark's memory cache.
这篇关于如何设置的Apache星火执行器存储器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!