Spark Streaming Kafka createDirectStream-Spark UI将输入事件大小显示为零 [英] Spark Streaming Kafka createDirectStream - Spark UI shows input event size as zero

查看:235
本文介绍了Spark Streaming Kafka createDirectStream-Spark UI将输入事件大小显示为零的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用createDirectStream实现了Spark Streaming.我的Kafka制作人每秒向具有两​​个分区的主题发送几条消息.

I have implemented Spark Streaming using createDirectStream. My Kafka producer is sending several messages every second to a topic with two partitions.

在Spark流媒体方面,我每秒都会读取kafka消息,并且正在以5秒的窗口大小和频率对其进行窗口化.

On Spark streaming side, i read kafka messages every second and them I'm windowing them on 5 second window size and frequency.

Kafka消息已正确处理,我正在查看正确的计算和打印结果.

Kafka message are properly processed, i'm seeing the right computations and prints.

但是在Spark Web UI的流"部分下,它将每个窗口的事件数显示为零.请看这张图片:

But in Spark Web UI, under Streaming section, it is showing number of events per window as Zero. Please see this image:

我很困惑,为什么它显示为零,不应该显示送入Spark Stream的Kafka消息的数量?

I'm puzzled why is it showing Zero, shouldn't it show number of Kafka messages being feed into Spark Stream?

已更新:

当我使用groupByKeyAndWindow()api时,似乎正在发生此问题.当我从代码中注释掉此api用法时,Spark Streaming UI开始正确报告Kafka事件输入大小.

This issue seems to be happening when i use groupByKeyAndWindow() api. When i commented out this api usage from my code, Spark Streaming UI started reporting Kafka event input size correctly.

有人知道为什么会这样吗?这可能是Spark Streaming中的缺陷吗?

Any idea why is this so? Could this a defect in Spark Streaming?

我正在使用Cloudera CDH:5.5.1,Spark:1.5.0,Kafka:KAFKA-0.8.2.0-1.kafka1.4.0.p0.56

I'm using Cloudera CDH: 5.5.1, Spark: 1.5.0, Kafka: KAFKA-0.8.2.0-1.kafka1.4.0.p0.56

推荐答案

Spark Kafka库代码似乎没有记录它.

It seems that it is not recorded by the Spark Kafka library code.

基于Spark 2.3.1

  1. 搜索Input Size / Records,发现它是stageData.inputBytes(StagePage.scala)的值
  2. 搜索StageDatainputBytes,发现它是metrics.inputMetrics.bytesRead(LiveEntity.scala)的值
  3. 搜索bytesRead,发现已在HadoopRDD.scalaFileScanRDD.scalaShuffleSuite.scala中进行了设置.但没有任何与Kafka相关的文件.
  1. Search Input Size / Records, found it is the value of stageData.inputBytes (StagePage.scala)
  2. Search StageData and inputBytes, found it is the value of metrics.inputMetrics.bytesRead (LiveEntity.scala)
  3. Search bytesRead, found it's set in HadoopRDD.scala, FileScanRDD.scala and ShuffleSuite.scala. But not in any Kafka related files.

这篇关于Spark Streaming Kafka createDirectStream-Spark UI将输入事件大小显示为零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆