将spark.local.dir设置为其他驱动器 [英] Set spark.local.dir to different drive

查看:299
本文介绍了将spark.local.dir设置为其他驱动器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Windows 10上安装独立的Spark.我想将 spark.local.dir 设置为 D:\ spark-tmp \ tmp ,如下所示:当前它似乎正在使用 C:\ Users \< me> \ AppData \ Local \ Temp ,在我的情况下,该驱动器位于SSD驱动器上,鉴于某些数据集的大小,该驱动器可能没有足够的空间

I'm trying to setup standalone Spark on Windows 10. I would like to set spark.local.dir to D:\spark-tmp\tmp, as currently it appears to be using C:\Users\<me>\AppData\Local\Temp, which in my case is on an SSD drive which might not have enough space given the size of some datasets.

所以我将文件%SPARK_HOME%\ conf \ spark-defaults.conf 更改为以下内容,但未成功

So I changed the file %SPARK_HOME%\conf\spark-defaults.conf to the following, without success

spark.eventLog.enabled           true
spark.eventLog.dir               file:/D:/spark-tmp/log
spark.local.dir                  file:/D:/spark-tmp/tmp

我还尝试运行%HADOOP_HOME \ bin \ winutils.exe chmod -R 777 D:/spark-tmp ,但它没有任何改变.

I also tried to run %HADOOP_HOME\bin\winutils.exe chmod -R 777 D:/spark-tmp, but it didn't change anything.

我得到的错误如下:

java.io.IOException: Failed to create a temp directory (under file:/D:/spark-tmp/tmp) after 10 attempts!

如果我以 file://D:/... (注意双斜杠)开头,则没有任何变化.如果我完全删除该方案,则会出现另一个异常,表明无法识别方案 D:.

If I start the path with file://D:/... (note the double slash) nothing changes. If I remove the scheme at all, a different exception says that the scheme D: is not recognized.

我也注意到了这个警告:

I also noticed this warning:

WARN  SparkConf:66 - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).

因此,我尝试将以下行放入%SPARK_HOME%\ conf \ spark-env.sh :

So I tried to put the following line in %SPARK_HOME%\conf\spark-env.sh:

SPARK_LOCAL_DIRS=file:/D:/spark-tmp/tmp

如果我将此行放在 .conf 文件中的 spark.local.dir 行中,Spark可以正常工作,但临时文件仍保存在我的计算机中 AppData \ Local \ Temp 文件夹.因此,不会读取 SPARK_LOCAL_DIRS 行.

If I put this line and comment the spark.local.dir line in the .conf file, Spark works perfectly, but the temporary files are still saved in my AppData\Local\Temp folder. So the SPARK_LOCAL_DIRS line is not read.

奇怪的是,如果我让它运行,它实际上会将日志放在 D:/spark-tmp/log 中,这意味着这不是语法或权限问题.

What's strange is that, if I let it run, it actually puts logs in D:/spark-tmp/log, which means that it's not a problem of syntax or permissions.

推荐答案

在Windows上,您必须制作这些环境变量

On windows you will have to make those environment variables

添加键值对

SPARK_LOCAL_DIRS -> d:\spark-tmp\tmp 

到您的系统环境变量

这篇关于将spark.local.dir设置为其他驱动器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆