将spark.local.dir设置为其他驱动器 [英] Set spark.local.dir to different drive
问题描述
我正在尝试在Windows 10上安装独立的Spark.我想将 spark.local.dir
设置为 D:\ spark-tmp \ tmp
,如下所示:当前它似乎正在使用 C:\ Users \< me> \ AppData \ Local \ Temp
,在我的情况下,该驱动器位于SSD驱动器上,鉴于某些数据集的大小,该驱动器可能没有足够的空间
I'm trying to setup standalone Spark on Windows 10. I would like to set spark.local.dir
to D:\spark-tmp\tmp
, as currently it appears to be using C:\Users\<me>\AppData\Local\Temp
, which in my case is on an SSD drive which might not have enough space given the size of some datasets.
所以我将文件%SPARK_HOME%\ conf \ spark-defaults.conf
更改为以下内容,但未成功
So I changed the file %SPARK_HOME%\conf\spark-defaults.conf
to the following, without success
spark.eventLog.enabled true
spark.eventLog.dir file:/D:/spark-tmp/log
spark.local.dir file:/D:/spark-tmp/tmp
我还尝试运行%HADOOP_HOME \ bin \ winutils.exe chmod -R 777 D:/spark-tmp
,但它没有任何改变.
I also tried to run %HADOOP_HOME\bin\winutils.exe chmod -R 777 D:/spark-tmp
, but it didn't change anything.
我得到的错误如下:
java.io.IOException: Failed to create a temp directory (under file:/D:/spark-tmp/tmp) after 10 attempts!
如果我以 file://D:/...
(注意双斜杠)开头,则没有任何变化.如果我完全删除该方案,则会出现另一个异常,表明无法识别方案 D:
.
If I start the path with file://D:/...
(note the double slash) nothing changes. If I remove the scheme at all, a different exception says that the scheme D:
is not recognized.
我也注意到了这个警告:
I also noticed this warning:
WARN SparkConf:66 - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
因此,我尝试将以下行放入%SPARK_HOME%\ conf \ spark-env.sh
:
So I tried to put the following line in %SPARK_HOME%\conf\spark-env.sh
:
SPARK_LOCAL_DIRS=file:/D:/spark-tmp/tmp
如果我将此行放在 .conf
文件中的 spark.local.dir
行中,Spark可以正常工作,但临时文件仍保存在我的计算机中 AppData \ Local \ Temp
文件夹.因此,不会读取 SPARK_LOCAL_DIRS
行.
If I put this line and comment the spark.local.dir
line in the .conf
file, Spark works perfectly, but the temporary files are still saved in my AppData\Local\Temp
folder. So the SPARK_LOCAL_DIRS
line is not read.
奇怪的是,如果我让它运行,它实际上会将日志放在 D:/spark-tmp/log
中,这意味着这不是语法或权限问题.
What's strange is that, if I let it run, it actually puts logs in D:/spark-tmp/log
, which means that it's not a problem of syntax or permissions.
推荐答案
在Windows上,您必须制作这些环境变量
On windows you will have to make those environment variables
添加键值对
SPARK_LOCAL_DIRS -> d:\spark-tmp\tmp
到您的系统环境变量
这篇关于将spark.local.dir设置为其他驱动器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!