如何在结构化流中的kafka数据源中为用户组设置group.id? [英] How to set group.id for consumer group in kafka data source in Structured Streaming?
问题描述
我想使用Spark结构化流从安全的Kafka中读取内容.这意味着我将需要强制使用特定的group.id.但是,如文档中所述,这是不可能的. 不过,在databricks文档中 https://docs. azuredatabricks.net/spark/latest/structured-streaming/kafka.html#using-ssl 表示可能.难道这只指的是天蓝色的星团吗?
I want to use Spark Structured Streaming to read from a secure kafka. This means that I will need to force a specific group.id. However, as is stated in the documentation this is not possible. Still, in the databricks documentation https://docs.azuredatabricks.net/spark/latest/structured-streaming/kafka.html#using-ssl, it says that it is possible. Does this only refer to the azure cluster?
此外,通过查看apache/spark存储库master分支的文档
Also, by looking at the documentation of the master branch of the apache/spark repo https://github.com/apache/spark/blob/master/docs/structured-streaming-kafka-integration.md, we can understand that such functionality is intended to be added at later spark releases. Do you know of any plans of such a stable release, that is going to allow setting that consumer group.id?
如果没有,Spark 2.4.0是否有任何变通办法可以设置特定的使用者组.id?
If not, are there any workarounds for Spark 2.4.0 to be able to set a specific consumer group.id?
推荐答案
当前(v2.4.0)是不可能的.
Currently (v2.4.0) it is not possible.
您可以在Apache Spark项目中检查以下几行:
You can check following lines in Apache Spark project:
在主分支中,您可以找到修改内容,该修改内容可用于设置前缀或特定的 group.id
In master branch you can find modification, that enable to setting prefix or particular group.id
https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L543 - set previously generated groupId, if kafka.group.id
wasn't passed in properties
这篇关于如何在结构化流中的kafka数据源中为用户组设置group.id?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!