Spark Streaming Kafka createDirectStream-Spark UI将输入事件大小显示为零 [英] Spark Streaming Kafka createDirectStream - Spark UI shows input event size as zero
问题描述
我已经使用createDirectStream实现了Spark Streaming.我的Kafka制作人每秒向具有两个分区的主题发送几条消息.
I have implemented Spark Streaming using createDirectStream. My Kafka producer is sending several messages every second to a topic with two partitions.
在Spark流媒体方面,我每秒都会读取kafka消息,并且正在以5秒的窗口大小和频率对其进行窗口化.
On Spark streaming side, i read kafka messages every second and them I'm windowing them on 5 second window size and frequency.
Kafka消息已正确处理,我正在查看正确的计算和打印结果.
Kafka message are properly processed, i'm seeing the right computations and prints.
但是在Spark Web UI的流"部分下,它将每个窗口的事件数显示为零.请看这张图片:
But in Spark Web UI, under Streaming section, it is showing number of events per window as Zero. Please see this image:
我很困惑,为什么它显示为零,不应该显示送入Spark Stream的Kafka消息的数量?
I'm puzzled why is it showing Zero, shouldn't it show number of Kafka messages being feed into Spark Stream?
已更新:
当我使用groupByKeyAndWindow()api时,似乎正在发生此问题.当我从代码中注释掉此api用法时,Spark Streaming UI开始正确报告Kafka事件输入大小.
This issue seems to be happening when i use groupByKeyAndWindow() api. When i commented out this api usage from my code, Spark Streaming UI started reporting Kafka event input size correctly.
有人知道为什么会这样吗?这可能是Spark Streaming中的缺陷吗?
Any idea why is this so? Could this a defect in Spark Streaming?
我正在使用Cloudera CDH:5.5.1,Spark:1.5.0,Kafka:KAFKA-0.8.2.0-1.kafka1.4.0.p0.56
I'm using Cloudera CDH: 5.5.1, Spark: 1.5.0, Kafka: KAFKA-0.8.2.0-1.kafka1.4.0.p0.56
推荐答案
Spark Kafka库代码似乎没有记录它.
It seems that it is not recorded by the Spark Kafka library code.
基于Spark 2.3.1
- 搜索
Input Size / Records
,发现它是stageData.inputBytes
(StagePage.scala)的值 - 搜索
StageData
和inputBytes
,发现它是metrics.inputMetrics.bytesRead
(LiveEntity.scala)的值 - 搜索
bytesRead
,发现已在HadoopRDD.scala
,FileScanRDD.scala
和ShuffleSuite.scala
中进行了设置.但没有任何与Kafka相关的文件.
- Search
Input Size / Records
, found it is the value ofstageData.inputBytes
(StagePage.scala) - Search
StageData
andinputBytes
, found it is the value ofmetrics.inputMetrics.bytesRead
(LiveEntity.scala) - Search
bytesRead
, found it's set inHadoopRDD.scala
,FileScanRDD.scala
andShuffleSuite.scala
. But not in any Kafka related files.
这篇关于Spark Streaming Kafka createDirectStream-Spark UI将输入事件大小显示为零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!