Spark 从多列 DataFrame 批量写入 Kafka 主题 [英] Spark batch write to Kafka topic from multi-column DataFrame

查看：87 发布时间：2021/11/14 23:24:16 apache-spark apache-kafka apache-spark-sql

本文介绍了Spark 从多列 DataFrame 批量写入 Kafka 主题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

批处理后，Spark ETL 我需要将包含多个不同列的结果数据帧写入 Kafka 主题.

After the batch, Spark ETL I need to write to Kafka topic the resulting DataFrame that contains multiple different columns.

根据以下 Spark 文档 https:///spark.apache.org/docs/2.2.0/structured-streaming-kafka-integration.html 写入 Kafka 的 Dataframe 应该在架构中包含以下强制性列:

According to the following Spark documentation https://spark.apache.org/docs/2.2.0/structured-streaming-kafka-integration.html the Dataframe being written to Kafka should have the following mandatory column in schema:

值(必需)字符串或二进制

value (required) string or binary

正如我之前提到的，我有更多带有值的列，所以我有一个问题 - 如何将整个 DataFrame 行作为单个消息从我的 Spark 应用程序正确发送到 Kafka 主题?我是否需要使用单个值列(将包含连接的值)将所有列中的所有值连接到新的 DataFrame 中，还是有更合适的方法来实现它?

As I mentioned previously, I have much more columns with values so I have a question - how to properly send the whole DataFrame row as a single message to Kafka topic from my Spark application? Do I need to join all of the values from all columns into the new DataFrame with a single value column(that will contain the joined value) or there is more proper way to achieve it?

Spark 从多列 DataFrame 批量写入 Kafka 主题 [英] Spark batch write to Kafka topic from multi-column DataFrame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark 从多列 DataFrame 批量写入 Kafka 主题 [英] Spark batch write to Kafka topic from multi-column DataFrame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭