Apache Spark与Google PubSub的结构化流 [英] Apache Spark’s Structured Streaming with Google PubSub
问题描述
我正在使用Spark Dstream从Google PubSub中提取和处理数据.
I'm using Spark Dstream to pull and process data from Google PubSub.
我正在寻找一种转移到结构化流媒体的方法,但仍使用Pub/Sub.
I'm looking for a way to move to structured streaming but still using Pub/Sub.
另外,我应该提到我的消息是在Pub/Sub中经过Snappy压缩的.
Also, I should mention that my messages are Snappy compressed in Pub/Sub.
我发现了此问题,该问题声称将Pub/Sub与结构化一起使用不支持流式传输.
I found this issue which claims that using Pub/Sub with structured streaming is not supported.
有人遇到了这个问题吗?是否有可能实现自定义Receiver以从Pub/Sub
Is someone has encountered this problem? Is it possible to implement custom Receiver to read the data from Pub/Sub
谢谢
推荐答案
功能请求您所引用的信息仍然准确:Cloud Pub/Sub不具有用于跟踪您的读取位置的偏移量概念,因此不支持使用Cloud Pub/Sub进行结构化的流式传输.
The feature request you referenced is still accurate: Cloud Pub/Sub does not have the concept of an offset to track your read position, so structured streaming with Cloud Pub/Sub is not supported.
这篇关于Apache Spark与Google PubSub的结构化流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!