为什么我只能看到一个 Spark Streaming kafkaReceiver [英] why I only can see one spark streaming kafkaReceiver
问题描述
我很困惑为什么我只能在 spark web UI 页面(8080)中看到一个 KafkaReceiver,但是我在Kafka中有10个分区,我在spark集群中使用了10个核心,我的代码在python中如下:kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer",{topic: 10})我想 KafkaReceivers 数字应该是 10 而不是 1.我很困惑.提前致谢!
I'm confused why I only can see one KafkaReceiver in spark web UI page(8080), But I do have 10 partitions in Kafka, and I used 10 cores in spark cluster, also my code as follows in python: kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer",{topic: 10}) I suppose the KafkaReceivers number should be 10 rather than 1. I’m so confused. thank you in advance!
推荐答案
kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer",{topic: 10})
该代码创建了 1 个具有 10 个线程的接收器.每个线程将附加到一个分区,所有数据将由 1 个消费者使用 1 个核心提取.所有其他内核将(可能)处理收到的数据.
That code creates 1 receiver with 10 thread. Each thread will attach to one partition and all data will be pulled by 1 consumer using 1 core. All other cores will (potentially) process the data received.
如果你想有 10 个接收器,每个接收器连接到 1 个分区,使用 1 个核心,你应该这样做:(在 Scala 中,我的 Python 很弱,但你明白了):
If you want to have 10 receivers, each one attached to 1 partition, using 1 core you should do this: (in Scala, my Python is weak, but you get the idea):
val recvs = (1 to 10).map(i => KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer",{topic: 1})
val kafkaData = ssc.union(recvs)
考虑到 Spark 需要额外的内核来处理接收到的数据.
Take into account that you will need additional cores for Spark to process the received data.
这篇关于为什么我只能看到一个 Spark Streaming kafkaReceiver的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!