为什么我只能看见一火花流kafkaReceiver [英] why I only can see one spark streaming kafkaReceiver

查看:221
本文介绍了为什么我只能看见一火花流kafkaReceiver的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很困惑,为什么我只能在火花网页UI页面(8080)去看One KafkaReceiver,
但是我确实有卡夫卡10个分区,我用火花集群10个核心,也是我​​的code作为在python如下:
KVS = KafkaUtils.createStream(SSC,zkQuorum,火花流媒体消费,{话题:10})
我猜想KafkaReceivers数量应该是10而不是1。
我很困惑。
预先感谢您!


解决方案

  KVS = KafkaUtils.createStream(SSC,zkQuorum,火花流媒体消费,{话题:10})

这code创建1接收机10线程。每个线程将附加到一个分区,所有数据将使用1核心消费1拉。所有其他内核将(可能)处理接收到的数据。

如果你想有10个接收器,连接到1分区各一个,用1个核心,你应该这样做:(Scala中,我的Python是弱,但你的想法):

  VAL RECVS =(1〜10).MAP(I => KafkaUtils.createStream(SSC,zkQuorum,火花流媒体消费,{话题:1})
VAL kafkaData = ssc.union(RECVS)

考虑到,你将需要额外的内核的火花,处理接收到的数据。

I'm confused why I only can see one KafkaReceiver in spark web UI page(8080), But I do have 10 partitions in Kafka, and I used 10 cores in spark cluster, also my code as follows in python: kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer",{topic: 10}) I suppose the KafkaReceivers number should be 10 rather than 1. I’m so confused. thank you in advance!

解决方案

kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer",{topic: 10})

That code creates 1 receiver with 10 thread. Each thread will attach to one partition and all data will be pulled by 1 consumer using 1 core. All other cores will (potentially) process the data received.

If you want to have 10 receivers, each one attached to 1 partition, using 1 core you should do this: (in Scala, my Python is weak, but you get the idea):

val recvs = (1 to 10).map(i => KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer",{topic: 1}) 
val kafkaData = ssc.union(recvs)

Take into account that you will need additional cores for Spark to process the received data.

这篇关于为什么我只能看见一火花流kafkaReceiver的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆