Windows:Apache Spark 历史服务器配置 [英] Windows: Apache Spark History Server Config

查看:35
本文介绍了Windows:Apache Spark 历史服务器配置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 Spark 的历史服务器来利用我的 Web UI 的日志记录机制,但我发现在我的 Windows 机器上运行此代码有些困难.

I wanted to use Spark's History Server to make use of the logging mechanisms of my Web UI, but I find some difficulty in running this code on my Windows machine.

我做了以下事情:

设置我的 spark-defaults.conf 文件以反映

Set my spark-defaults.conf file to reflect

spark.eventLog.enabled=true
spark.eventLog.dir=file://C:/spark-1.6.2-bin-hadoop2.6/logs
spark.history.fs.logDirectory=file://C:/spark-1.6.2-bin-hadoop2.6/logs

我要反映的 spark-env.sh:

My spark-env.sh to reflect:

SPARK_LOG_DIR    "file://C:/spark-1.6.2-bin-hadoop2.6/logs"
SPARK_HISTORY_OPTS   "-Dspark.history.fs.logDirectory=file://C:/spark-1.6.2-bin-hadoop2.6/logs"

我正在使用 Git-BASH 运行 start-history-server.sh 文件,如下所示:

I am using Git-BASH to run the start-history-server.sh file, like this:

USERA@SYUHUH MINGW64 /c/spark-1.6.2-bin-hadoop2.6/sbin
$ sh start-history-server.sh

而且,我收到此错误:

USERA@SYUHUH MINGW64 /c/spark-1.6.2-bin-hadoop2.6/sbin
$ sh start-history-server.sh
C:\spark-1.6.2-bin-hadoop2.6/conf/spark-env.sh: line 69: SPARK_LOG_DIR: command not found
C:\spark-1.6.2-bin-hadoop2.6/conf/spark-env.sh: line 70: SPARK_HISTORY_OPTS: command not found
ps: unknown option -- o
Try `ps --help' for more information.
starting org.apache.spark.deploy.history.HistoryServer, logging to C:\spark-1.6.2-bin-hadoop2.6/logs/spark--org.apache.spark.deploy.history.HistoryServer-1-SGPF02M9ZB.out
ps: unknown option -- o
Try `ps --help' for more information.
failed to launch org.apache.spark.deploy.history.HistoryServer:
  Spark Command: C:\Program Files (x86)\Java\jdk1.8.0_91\bin\java -cp C:\spark-1.6.2-bin-hadoop2.6/conf\;C:\spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-api-jdo-3.2.6.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-core-3.2.10.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g org.apache.spark.deploy.history.HistoryServer
  ========================================
full log in C:\spark-1.6.2-bin-hadoop2.6/logs/spark--org.apache.spark.deploy.history.HistoryServer-1-SGPF02M9ZB.out

输出的完整日志可以在下面找到:

The full log from the output can be found below:

Spark Command: C:\Program Files (x86)\Java\jdk1.8.0_91\bin\java -cp C:\spark-1.6.2-bin-hadoop2.6/conf\;C:\spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-api-jdo-3.2.6.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-core-3.2.10.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g org.apache.spark.deploy.history.HistoryServer
========================================

我正在运行一个 sparkR 脚本,在其中初始化我的 spark 上下文,然后调用 init().

I am running a sparkR script where I initialize my spark context and then call init().

请告知我是否应该在运行 Spark 脚本之前运行历史服务器?

Please advise whether I should be running the history server before I run my spark script?

指针&继续(关于日志记录)的提示将不胜感激.

Pointers & tips to proceed(with respect to logging) would be greatly appreciated.

推荐答案

在 Windows 上,您需要运行 Spark 的 .cmd 文件,而不是 .sh.根据我所见,Spark 历史服务器没有 .cmd 脚本.所以基本上它需要手动运行.

On Windows you'll need to run the .cmd files of Spark not .sh. According to what I saw, there is no .cmd script for Spark history server. So basically it needs to be run manually.

我遵循了历史服务器 Linux 脚本,为了在 Windows 上手动运行它,您需要执行以下步骤:

I have followed the history server Linux script and in order to run it manually on Windows you'll need to take the following steps:

  • 所有历史服务器配置都应在 spark-defaults.conf 文件中设置(删除 .template 后缀),如下所述
  • 您应该转到 spark config 目录并将 spark.history.* 配置添加到 %SPARK_HOME%/conf/spark-defaults.conf.如下:

  • All history server configurations should be set at the spark-defaults.conf file (remove .template suffix) as described below
  • You should go to spark config directory and add the spark.history.* configurations to %SPARK_HOME%/conf/spark-defaults.conf. As follows:

spark.eventLog.enabled truespark.history.fs.logDirectory file:///c:/logs/dir/path

配置完成后,从 %SPARK_HOME% 运行以下命令

After configuration is finished run the following command from %SPARK_HOME%

bin\spark-class.cmd org.apache.spark.deploy.history.HistoryServer

输出应该是这样的:

16/07/22 18:51:23 INFO Utils:在端口 18080 上成功启动服务.16/07/22 18:51:23 INFO HistoryServer:在 http://10.0.240.108:18080 启动 HistoryServer16/07/22 18:52:09 INFO ShutdownHookManager:调用了关闭钩子

希望有帮助!:-)

这篇关于Windows:Apache Spark 历史服务器配置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆