Spark Streaming Kafka createDirectStream - Spark UI 将输入事件大小显示为零 [英] Spark Streaming Kafka createDirectStream - Spark UI shows input event size as zero

查看:28
本文介绍了Spark Streaming Kafka createDirectStream - Spark UI 将输入事件大小显示为零的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用 createDirectStream 实现了 Spark Streaming.我的 Kafka 生产者每秒向具有两​​个分区的主题发送多条消息.

I have implemented Spark Streaming using createDirectStream. My Kafka producer is sending several messages every second to a topic with two partitions.

在 Spark 流媒体方面,我每秒读取一次 kafka 消息,然后我将它们按 5 秒的窗口大小和频率进行窗口化.

On Spark streaming side, i read kafka messages every second and them I'm windowing them on 5 second window size and frequency.

Kafka 消息得到了正确处理,我看到了正确的计算和打印.

Kafka message are properly processed, i'm seeing the right computations and prints.

但在 Spark Web UI 中,在 Streaming 部分下,它显示每个窗口的事件数为零.请看这张图:

But in Spark Web UI, under Streaming section, it is showing number of events per window as Zero. Please see this image:

我很困惑为什么它显示零,它不应该显示馈入 Spark Stream 的 Kafka 消息的数量吗?

I'm puzzled why is it showing Zero, shouldn't it show number of Kafka messages being feed into Spark Stream?

更新:

这个问题似乎在我使用 groupByKeyAndWindow() api 时发生.当我从我的代码中注释掉这个 api 用法时,Spark Streaming UI 开始正确报告 Kafka 事件输入大小.

This issue seems to be happening when i use groupByKeyAndWindow() api. When i commented out this api usage from my code, Spark Streaming UI started reporting Kafka event input size correctly.

知道为什么会这样吗?这会是 Spark Streaming 的缺陷吗?

Any idea why is this so? Could this a defect in Spark Streaming?

我使用的是 Cloudera CDH:5.5.1,Spark:1.5.0,Kafka:KAFKA-0.8.2.0-1.kafka1.4.0.p0.56

I'm using Cloudera CDH: 5.5.1, Spark: 1.5.0, Kafka: KAFKA-0.8.2.0-1.kafka1.4.0.p0.56

推荐答案

好像不是Spark Kafka库代码记录的.

It seems that it is not recorded by the Spark Kafka library code.

基于Spark 2.3.1

  1. 搜索Input Size/Records,发现是stageData.inputBytes(StagePage.scala)的值
  2. 搜索StageDatainputBytes,发现是metrics.inputMetrics.bytesRead(LiveEntity.scala)的值
  3. 搜索bytesRead,发现它在HadoopRDD.scalaFileScanRDD.scalaShuffleSuite.scala中设置.但不在任何与 Kafka 相关的文件中.
  1. Search Input Size / Records, found it is the value of stageData.inputBytes (StagePage.scala)
  2. Search StageData and inputBytes, found it is the value of metrics.inputMetrics.bytesRead (LiveEntity.scala)
  3. Search bytesRead, found it's set in HadoopRDD.scala, FileScanRDD.scala and ShuffleSuite.scala. But not in any Kafka related files.

这篇关于Spark Streaming Kafka createDirectStream - Spark UI 将输入事件大小显示为零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆