是否可以让火花结构化流(更新模式)写入数据库? [英] is it possible to let spark structured stream(update mode) to write to db?

查看：20 发布时间：2021/11/12 3:13:28 apache-spark apache-kafka spark-structured-streaming

本文介绍了是否可以让火花结构化流(更新模式)写入数据库?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用 spark(3.0.0) 结构化流从 kafka 读取主题.

I use spark(3.0.0) structured streaming to read topic from kafka.

我使用了 joins 然后使用了 mapGropusWithState 来获取我的流数据，所以我必须使用 update 模式，基于我对 spark 官方指南的理解:https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-modes

I've used joins and then used mapGropusWithState to get my stream data, so I have to use update mode, based on my understanding from the spark offical guide: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-modes

spark 官方指南的以下部分没有提及 DB sink，并且它不支持写入 files 或者 update mode:https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-sinks

Below section of the spark offical guide says nothing about DB sink, and It does not support write to files either for update mode: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-sinks

目前我将其输出到console，我想将数据存储在文件或数据库中.

Currently I output it to console, and I would like to to store the data in files or DB.

所以我的问题是:在我的情况下，如何将流数据写入数据库或文件?我是否必须将数据写入 kafka，然后使用 kafka connect 将它们读回文件/db?

So my question is: how can I write the stream data to db or file in my situation? Do i have to write the data to kafka and then use kafka connect to read them back to files/db?

附言我按照文章获取聚合流式查询.

p.s. I followed the articles to get the aggregated streaming query.

- https://stackoverflow.com/questions/62738727/how-to-deduplicate-and-keep-latest-based-on-timestamp-field-in-spark-structured
- https://databricks.com/blog/2017/10/17/arbitrary-stateful-processing-in-apache-sparks-structured-streaming.html
- will also try one more time for below using java api
(https://stackoverflow.com/questions/50933606/spark-streaming-select-record-with-max-timestamp-for-each-id-in-dataframe-pysp)

是否可以让火花结构化流(更新模式)写入数据库? [英] is it possible to let spark structured stream(update mode) to write to db?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

是否可以让火花结构化流(更新模式)写入数据库? [英] is it possible to let spark structured stream(update mode) to write to db?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭