Spark:监视集群模式应用程序 [英] Spark: Monitoring a cluster mode application

查看:52
本文介绍了Spark:监视集群模式应用程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在,我正在使用spark-submit在集群模式下启动应用程序.来自主服务器的响应提供了一个带有SubmittingId的json对象,我用它来标识应用程序并在必要时将其杀死.但是,我还没有找到一种简单的方法来检索工人休息url (来自主服务器响应或驱动程序ID)(可能会通过Web抓取主Web ui,但这很丑陋).相反,我必须等到应用程序完成后,再从历史记录服务器中查找应用程序统计信息.

Right now I'm using spark-submit to launch an application in cluster mode. The response from the master server gives a json object with a submissionId which I use to identify the application and kill it if necessary. However, I haven't found a simple way to retrieve the worker rest url from the master server response or the driver id (probably could web scrape the master web ui but that would be ugly). Instead, I have to wait until the application finishes, then look up the application statistics from the history server.

有什么方法可以使用driver-id从群集模式下部署的应用程序(通常在worker-node:4040处)中识别worker URL?

Is there any way to use the driver-id to identify the worker url from an application deployed in cluster mode (usually at worker-node:4040)?

16/08/12 11:39:47 INFO RestSubmissionClient: Submitting a request to launch an application in spark://192.yyy:6066.
16/08/12 11:39:47 INFO RestSubmissionClient: Submission successfully created as driver-20160812114003-0001. Polling submission state...
16/08/12 11:39:47 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20160812114003-0001 in spark://192.yyy:6066.
16/08/12 11:39:47 INFO RestSubmissionClient: State of driver driver-20160812114003-0001 is now RUNNING.
16/08/12 11:39:47 INFO RestSubmissionClient: Driver is running on worker worker-20160812113715-192.xxx-46215 at 192.xxx:46215.
16/08/12 11:39:47 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
    "action" : "CreateSubmissionResponse",
    "message" : "Driver successfully submitted as driver-20160812114003-0001",
    "serverSparkVersion" : "1.6.1",
    "submissionId" : "driver-20160812114003-0001",
    "success" : true
}


这是DEBUG的log4j控制台输出的典型输出结果


Here's what a typical output looks like with log4j console output at DEBUG

Spark-submit命令:

Spark-submit command:

./apps/spark-2.0.0-bin-hadoop2.7/bin/spark-submit --master mesos://masterurl:7077 
    --verbose --class MainClass --deploy-mode cluster
    ~/path/myjar.jar args

火花提交输出:

Using properties file: null
Parsed arguments:
  master                  mesos://masterurl:7077
  deployMode              cluster
  executorMemory          null
  executorCores           null
  totalExecutorCores      null
  propertiesFile          null
  driverMemory            null
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               MyApp
  primaryResource         file:/path/myjar.jar
  name                    MyApp
  childArgs               [args]
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file null:



Main class:
org.apache.spark.deploy.rest.RestSubmissionClient
Arguments:
file:/path/myjar.jar
MyApp
args
System properties:
SPARK_SUBMIT -> true
spark.driver.supervise -> false
spark.app.name -> MyApp
spark.jars -> file:/path/myjar.jar
spark.submit.deployMode -> cluster
spark.master -> mesos://masterurl:7077
Classpath elements:



16/08/17 13:26:49 INFO RestSubmissionClient: Submitting a request to launch an application in mesos://masterurl:7077.
16/08/17 13:26:49 DEBUG RestSubmissionClient: Sending POST request to server at http://masterurl:7077/v1/submissions/create:
{
  "action" : "CreateSubmissionRequest",
  "appArgs" : [ args ],
  "appResource" : "file:/path/myjar.jar",
  "clientSparkVersion" : "2.0.0",
  "environmentVariables" : {
    "SPARK_SCALA_VERSION" : "2.10"
  },
  "mainClass" : "SimpleSort",
  "sparkProperties" : {
    "spark.jars" : "file:/path/myjar.jar",
    "spark.driver.supervise" : "false",
    "spark.app.name" : "MyApp",
    "spark.submit.deployMode" : "cluster",
    "spark.master" : "mesos://masterurl:7077"
  }
}
16/08/17 13:26:49 DEBUG RestSubmissionClient: Response from the server:
{
  "action" : "CreateSubmissionResponse",
  "serverSparkVersion" : "2.0.0",
  "submissionId" : "driver-20160817132658-0004",
  "success" : true
}
16/08/17 13:26:49 INFO RestSubmissionClient: Submission successfully created as driver-20160817132658-0004. Polling submission state...
16/08/17 13:26:49 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20160817132658-0004 in mesos://masterurl:7077.
16/08/17 13:26:49 DEBUG RestSubmissionClient: Sending GET request to server at http://masterurl:7077/v1/submissions/status/driver-20160817132658-0004.
16/08/17 13:26:49 DEBUG RestSubmissionClient: Response from the server:
{
  "action" : "SubmissionStatusResponse",
  "driverState" : "RUNNING",
  "serverSparkVersion" : "2.0.0",
  "submissionId" : "driver-20160817132658-0004",
  "success" : true
}
16/08/17 13:26:49 INFO RestSubmissionClient: State of driver driver-20160817132658-0004 is now RUNNING.
16/08/17 13:26:49 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
  "action" : "CreateSubmissionResponse",
  "serverSparkVersion" : "2.0.0",
  "submissionId" : "driver-20160817132658-0004",
  "success" : true
}

推荐答案

主服务器的响应是否不提供应用程序ID?

Does the master server's response not provide application-id?

我相信您所需要的就是用于解决此问题的应用程序的主URL和应用程序ID.拥有应用程序ID后,请使用master-URL上的端口4040,并将所需的端点附加到该端口.

I believe all you need is the master-URL and application-id of your application for this problem. Once you have the application-id, use the port 4040 at master-URL and append your intended endpoint to it.

例如,如果您的应用程序ID为application_1468141556944_1055

For example, if your application id is application_1468141556944_1055

获取所有作业的列表

http://<master>:4040/api/v1/applications/application_1468141556944_1055/jobs

获取已存储的RDD列表

http://<master>:4040/api/v1/applications/application_1468141556944_1055/storage/rdd

但是,如果您没有application-id,我可能会从以下内容开始:

However if you don't have application-id, I would probably start with following:

在启动spark作业以在控制台上获取应用程序ID时,设置 verbose 模式(--verbose).然后,您可以在日志输出中解析application-id.日志输出通常如下所示:

Set verbose mode (--verbose) while launching spark job to get application id on console. You can then parse for application-id in log output. The log output usually looks like:

16/08/12 08:50:53 INFO Client: Application report for application_1468141556944_3791 (state: RUNNING)

因此,应用程序ID为 application_1468141556944_3791

thus, application-id is application_1468141556944_3791

您还可以通过在日志输出中跟踪URL来查找master-url和application-id,

You can also find master-url and application-id through tracking URL in the log output, which looks like

    client token: N/A
    diagnostics: N/A
    ApplicationMaster host: 10.50.0.33
    ApplicationMaster RPC port: 0
    queue: ns_debug
    start time: 1470992969127
    final status: UNDEFINED
    tracking URL: http://<master>:8088/proxy/application_1468141556944_3799/

这些消息处于INFO日志级别,因此请确保在log4j.properties文件中设置 log4j.rootCategory = INFO,控制台,以便您可以看到它们.

These messages are at INFO log level so make sure you set log4j.rootCategory=INFO, console in log4j.properties file so that you can see them.

这篇关于Spark:监视集群模式应用程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆