Windows:Apache Spark历史记录服务器配置 [英] Windows: Apache Spark History Server Config

查看:200
本文介绍了Windows:Apache Spark历史记录服务器配置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Spark的History Server来利用Web UI的日志记录机制,但是在Windows机器上运行此代码时遇到了一些困难.

I wanted to use Spark's History Server to make use of the logging mechanisms of my Web UI, but I find some difficulty in running this code on my Windows machine.

我已经执行以下操作:

设置我的spark-defaults.conf文件以反映

Set my spark-defaults.conf file to reflect

spark.eventLog.enabled=true
spark.eventLog.dir=file://C:/spark-1.6.2-bin-hadoop2.6/logs
spark.history.fs.logDirectory=file://C:/spark-1.6.2-bin-hadoop2.6/logs

我的spark-env.sh反映:

My spark-env.sh to reflect:

SPARK_LOG_DIR    "file://C:/spark-1.6.2-bin-hadoop2.6/logs"
SPARK_HISTORY_OPTS   "-Dspark.history.fs.logDirectory=file://C:/spark-1.6.2-bin-hadoop2.6/logs"

我正在使用Git-BASH运行start-history-server.sh文件,如下所示:

I am using Git-BASH to run the start-history-server.sh file, like this:

USERA@SYUHUH MINGW64 /c/spark-1.6.2-bin-hadoop2.6/sbin
$ sh start-history-server.sh

而且,我收到此错误:

USERA@SYUHUH MINGW64 /c/spark-1.6.2-bin-hadoop2.6/sbin
$ sh start-history-server.sh
C:\spark-1.6.2-bin-hadoop2.6/conf/spark-env.sh: line 69: SPARK_LOG_DIR: command not found
C:\spark-1.6.2-bin-hadoop2.6/conf/spark-env.sh: line 70: SPARK_HISTORY_OPTS: command not found
ps: unknown option -- o
Try `ps --help' for more information.
starting org.apache.spark.deploy.history.HistoryServer, logging to C:\spark-1.6.2-bin-hadoop2.6/logs/spark--org.apache.spark.deploy.history.HistoryServer-1-SGPF02M9ZB.out
ps: unknown option -- o
Try `ps --help' for more information.
failed to launch org.apache.spark.deploy.history.HistoryServer:
  Spark Command: C:\Program Files (x86)\Java\jdk1.8.0_91\bin\java -cp C:\spark-1.6.2-bin-hadoop2.6/conf\;C:\spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-api-jdo-3.2.6.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-core-3.2.10.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g org.apache.spark.deploy.history.HistoryServer
  ========================================
full log in C:\spark-1.6.2-bin-hadoop2.6/logs/spark--org.apache.spark.deploy.history.HistoryServer-1-SGPF02M9ZB.out

输出的完整日志可以在下面找到:

The full log from the output can be found below:

Spark Command: C:\Program Files (x86)\Java\jdk1.8.0_91\bin\java -cp C:\spark-1.6.2-bin-hadoop2.6/conf\;C:\spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-api-jdo-3.2.6.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-core-3.2.10.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g org.apache.spark.deploy.history.HistoryServer
========================================

我正在运行一个sparkR脚本,在该脚本中初始化我的spark上下文,然后调用init().

I am running a sparkR script where I initialize my spark context and then call init().

请告知在运行Spark脚本之前是否应该运行历史记录服务器?

Please advise whether I should be running the history server before I run my spark script?

指针和继续操作的技巧(关于日志记录)将不胜感激.

Pointers & tips to proceed(with respect to logging) would be greatly appreciated.

推荐答案

在Windows上,您需要运行Spark的 .cmd 文件,而不是 .sh .根据我所看到的,Spark历史记录服务器没有 .cmd 脚本.因此,基本上它需要手动运行.

On Windows you'll need to run the .cmd files of Spark not .sh. According to what I saw, there is no .cmd script for Spark history server. So basically it needs to be run manually.

我遵循了历史记录服务器Linux脚本,为了在Windows上手动运行它,您需要执行以下步骤:

I have followed the history server Linux script and in order to run it manually on Windows you'll need to take the following steps:

  • 所有历史记录服务器配置都应在 spark-defaults.conf 文件中设置(删除后缀.template),如下所述
  • 您应该转到spark config目录并将spark.history.*配置添加到%SPARK_HOME%/conf/spark-defaults.conf.如下:

  • All history server configurations should be set at the spark-defaults.conf file (remove .template suffix) as described below
  • You should go to spark config directory and add the spark.history.* configurations to %SPARK_HOME%/conf/spark-defaults.conf. As follows:

spark.eventLog.enabled true spark.history.fs.logDirectory file:///c:/logs/dir/path

spark.eventLog.enabled true spark.history.fs.logDirectory file:///c:/logs/dir/path

配置完成后,从%SPARK_HOME%运行以下命令

After configuration is finished run the following command from %SPARK_HOME%

bin\spark-class.cmd org.apache.spark.deploy.history.HistoryServer

输出应该是这样的:

16/07/22 18:51:23 INFO Utils: Successfully started service on port 18080. 16/07/22 18:51:23 INFO HistoryServer: Started HistoryServer at http://10.0.240.108:18080 16/07/22 18:52:09 INFO ShutdownHookManager: Shutdown hook called

16/07/22 18:51:23 INFO Utils: Successfully started service on port 18080. 16/07/22 18:51:23 INFO HistoryServer: Started HistoryServer at http://10.0.240.108:18080 16/07/22 18:52:09 INFO ShutdownHookManager: Shutdown hook called

希望它会有所帮助! :-)

Hope that it helps! :-)

这篇关于Windows:Apache Spark历史记录服务器配置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆