Windows:Apache Spark 历史服务器配置 [英] Windows: Apache Spark History Server Config
问题描述
我想使用 Spark 的历史服务器来利用我的 Web UI 的日志记录机制,但我发现在我的 Windows 机器上运行此代码有些困难.
I wanted to use Spark's History Server to make use of the logging mechanisms of my Web UI, but I find some difficulty in running this code on my Windows machine.
我做了以下事情:
设置我的 spark-defaults.conf 文件以反映
Set my spark-defaults.conf file to reflect
spark.eventLog.enabled=true
spark.eventLog.dir=file://C:/spark-1.6.2-bin-hadoop2.6/logs
spark.history.fs.logDirectory=file://C:/spark-1.6.2-bin-hadoop2.6/logs
我要反映的 spark-env.sh:
My spark-env.sh to reflect:
SPARK_LOG_DIR "file://C:/spark-1.6.2-bin-hadoop2.6/logs"
SPARK_HISTORY_OPTS "-Dspark.history.fs.logDirectory=file://C:/spark-1.6.2-bin-hadoop2.6/logs"
我正在使用 Git-BASH 运行 start-history-server.sh 文件,如下所示:
I am using Git-BASH to run the start-history-server.sh file, like this:
USERA@SYUHUH MINGW64 /c/spark-1.6.2-bin-hadoop2.6/sbin
$ sh start-history-server.sh
而且,我收到此错误:
USERA@SYUHUH MINGW64 /c/spark-1.6.2-bin-hadoop2.6/sbin
$ sh start-history-server.sh
C:\spark-1.6.2-bin-hadoop2.6/conf/spark-env.sh: line 69: SPARK_LOG_DIR: command not found
C:\spark-1.6.2-bin-hadoop2.6/conf/spark-env.sh: line 70: SPARK_HISTORY_OPTS: command not found
ps: unknown option -- o
Try `ps --help' for more information.
starting org.apache.spark.deploy.history.HistoryServer, logging to C:\spark-1.6.2-bin-hadoop2.6/logs/spark--org.apache.spark.deploy.history.HistoryServer-1-SGPF02M9ZB.out
ps: unknown option -- o
Try `ps --help' for more information.
failed to launch org.apache.spark.deploy.history.HistoryServer:
Spark Command: C:\Program Files (x86)\Java\jdk1.8.0_91\bin\java -cp C:\spark-1.6.2-bin-hadoop2.6/conf\;C:\spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-api-jdo-3.2.6.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-core-3.2.10.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g org.apache.spark.deploy.history.HistoryServer
========================================
full log in C:\spark-1.6.2-bin-hadoop2.6/logs/spark--org.apache.spark.deploy.history.HistoryServer-1-SGPF02M9ZB.out
输出的完整日志可以在下面找到:
The full log from the output can be found below:
Spark Command: C:\Program Files (x86)\Java\jdk1.8.0_91\bin\java -cp C:\spark-1.6.2-bin-hadoop2.6/conf\;C:\spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-api-jdo-3.2.6.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-core-3.2.10.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g org.apache.spark.deploy.history.HistoryServer
========================================
我正在运行一个 sparkR 脚本,在其中初始化我的 spark 上下文,然后调用 init().
I am running a sparkR script where I initialize my spark context and then call init().
请告知我是否应该在运行 Spark 脚本之前运行历史服务器?
Please advise whether I should be running the history server before I run my spark script?
指针&继续(关于日志记录)的提示将不胜感激.
Pointers & tips to proceed(with respect to logging) would be greatly appreciated.
推荐答案
在 Windows 上,您需要运行 Spark 的 .cmd 文件,而不是 .sh.根据我所见,Spark 历史服务器没有 .cmd 脚本.所以基本上它需要手动运行.
On Windows you'll need to run the .cmd files of Spark not .sh. According to what I saw, there is no .cmd script for Spark history server. So basically it needs to be run manually.
我遵循了历史服务器 Linux 脚本,为了在 Windows 上手动运行它,您需要执行以下步骤:
I have followed the history server Linux script and in order to run it manually on Windows you'll need to take the following steps:
- 所有历史服务器配置都应在 spark-defaults.conf 文件中设置(删除
.template
后缀),如下所述 您应该转到 spark config 目录并将
spark.history.*
配置添加到%SPARK_HOME%/conf/spark-defaults.conf
.如下:
- All history server configurations should be set at the spark-defaults.conf file (remove
.template
suffix) as described below You should go to spark config directory and add the
spark.history.*
configurations to%SPARK_HOME%/conf/spark-defaults.conf
. As follows:
spark.eventLog.enabled truespark.history.fs.logDirectory file:///c:/logs/dir/path
配置完成后,从 %SPARK_HOME% 运行以下命令
After configuration is finished run the following command from %SPARK_HOME%
bin\spark-class.cmd org.apache.spark.deploy.history.HistoryServer
输出应该是这样的:
16/07/22 18:51:23 INFO Utils:在端口 18080 上成功启动服务.16/07/22 18:51:23 INFO HistoryServer:在 http://10.0.240.108:18080 启动 HistoryServer16/07/22 18:52:09 INFO ShutdownHookManager:调用了关闭钩子
希望有帮助!:-)
这篇关于Windows:Apache Spark 历史服务器配置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!