如何访问Spark Streaming应用程序的统计信息终结点? [英] How to access statistics endpoint for a Spark Streaming application?

查看:163
本文介绍了如何访问Spark Streaming应用程序的统计信息终结点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从Spark 2.2.0开始,API中提供了新的端点来获取有关流作业的信息.

As of Spark 2.2.0, there's are new endpoints in the API for getting information about streaming jobs.

我在集群模式下使用Spark 2.2.0在EMR集群上运行Spark.

I run Spark on EMR clusters, using Spark 2.2.0 in cluster mode.

当我点击流作业的端点时,所得到的只是错误消息:

When I hit the endpoint for my streaming jobs, all it gives me is the error message:

没有附加到<流名称>

no streaming listener attached to <stream name>

我已经稍微研究了一下Spark代码库,但是该功能没有得到很好的记录.所以我很好奇这是否是一个错误?我需要做一些配置才能使此端点正常工作吗?

I've dug through the Spark codebase a bit, but this feature is not very well documented. So I'm curious if this is a bug? Is there some configuration I need to do to get this endpoint working?

在群集上运行时,这似乎是一个问题.在我的本地计算机上的Spark 2.2.0上运行的相同代码显示了预期的统计信息,但是在群集上运行时给出了错误消息.

This appears to be an issue specifically when running on the cluster. The same code running on Spark 2.2.0 on my local machine shows the statistics as expected, but gives that error message when run on the cluster.

推荐答案

我正在使用由大师制作的最新的 Spark 2.3.0-SNAPSHOT ,所以 YMMV .效果很好.

I'm using the very latest Spark 2.3.0-SNAPSHOT built today from the master so YMMV. It worked fine.

是否需要做一些配置才能使该端点正常工作?

Is there some configuration I need to do to get this endpoint working?

不.应该可以正常工作,而无需更改默认配置.

No. It's supposed to work fine with no changes to the default configuration.

确保您使用驱动程序的主机和端口(如谣言一样,您还可以访问Spark History Server的18080,它确实显示所有相同的终结点和相同的作业正在运行,但未连接流式侦听器).

Make sure the you use the host and port of the driver (as rumors are that you could also access 18080 of Spark History Server that does show all the same endpoints, and the same jobs running, but no streaming listener attached).

您可以在

As you can see in the source code where the error message lives it can happen only when ui.getStreamingJobProgressListener has not been registered (that ends up in case None).

所以现在的问题是,为什么不注册SparkListener?

So the question now is why would that SparkListener not be registered?

这导致我们进入 setStreamingJobProgressListener 方法专门StreamingTab

That leads us to the streamingJobProgressListener var that is set using setStreamingJobProgressListener method exclusively while StreamingTab is being instantiated (which was the reason why I asked you if you can see the Streaming tab).

换句话说,如果您在Web UI中看到流"选项卡,则可以使用流指标的端点.检查指向端点的URL,其格式应为:

In other words, if you see the Streaming tab in web UI, you have the streaming metric endpoint(s) available. Check the URL to the endpoint which should be in the format:

http://[driverHost]:[port]/api/v1/applications/[appId]/streaming/statistics


我试图重述您的案子,并通过以下步骤使我想到了可行的案子.


I tried to reproduce your case and did the following that led me to a working case.

  1. 启动了Spark Streaming应用程序的官方示例之一.

  1. Started one of the official examples of Spark Streaming applications.

$ ./bin/run-example streaming.StatefulNetworkWordCount localhost 9999

我确实先运行了nc -lk 9999.

打开Web UI @ http://localhost:4040/streaming 以确保流式传输标签在那里.

Opened the web UI @ http://localhost:4040/streaming to make sure the Streaming tab is there.

确保 http://localhost:4040/api/v1/applications/使用应用程序ID进行响应.

Made sure http://localhost:4040/api/v1/applications/ responds with application ids.

$ http http://localhost:4040/api/v1/applications/
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 266
Content-Type: application/json
Date: Wed, 13 Dec 2017 07:58:04 GMT
Server: Jetty(9.3.z-SNAPSHOT)
Vary: Accept-Encoding, User-Agent

[
    {
        "attempts": [
            {
                "appSparkVersion": "2.3.0-SNAPSHOT",
                "completed": false,
                "duration": 0,
                "endTime": "1969-12-31T23:59:59.999GMT",
                "endTimeEpoch": -1,
                "lastUpdated": "2017-12-13T07:53:53.751GMT",
                "lastUpdatedEpoch": 1513151633751,
                "sparkUser": "jacek",
                "startTime": "2017-12-13T07:53:53.751GMT",
                "startTimeEpoch": 1513151633751
            }
        ],
        "id": "local-1513151634282",
        "name": "StatefulNetworkWordCount"
    }
]

  • 访问了Spark Streaming应用程序的终结点@ http ://localhost:4040/api/v1/applications/local-1513151634282/streaming/statistics .

    $ http http://localhost:4040/api/v1/applications/local-1513151634282/streaming/statistics
    HTTP/1.1 200 OK
    Content-Encoding: gzip
    Content-Length: 219
    Content-Type: application/json
    Date: Wed, 13 Dec 2017 08:00:10 GMT
    Server: Jetty(9.3.z-SNAPSHOT)
    Vary: Accept-Encoding, User-Agent
    
    {
        "avgInputRate": 0.0,
        "avgProcessingTime": 30,
        "avgSchedulingDelay": 0,
        "avgTotalDelay": 30,
        "batchDuration": 1000,
        "numActiveBatches": 0,
        "numActiveReceivers": 1,
        "numInactiveReceivers": 0,
        "numProcessedRecords": 0,
        "numReceivedRecords": 0,
        "numReceivers": 1,
        "numRetainedCompletedBatches": 376,
        "numTotalCompletedBatches": 376,
        "startTime": "2017-12-13T07:53:54.921GMT"
    }
    

  • 这篇关于如何访问Spark Streaming应用程序的统计信息终结点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆