如何将流数据集写入Cassandra? [英] How to write streaming Dataset to Cassandra?
问题描述
所以我有一个基于Python流的DataFrame df
,其中包含我想使用
So I have a Python Stream-sourced DataFrame df
that has all the data I want to place into a Cassandra table with the spark-cassandra-connector. I've tried doing this in two ways:
df.write \
.format("org.apache.spark.sql.cassandra") \
.mode('append') \
.options(table="myTable",keyspace="myKeySpace") \
.save()
query = df.writeStream \
.format("org.apache.spark.sql.cassandra") \
.outputMode('append') \
.options(table="myTable",keyspace="myKeySpace") \
.start()
query.awaitTermination()
但是我继续分别遇到此错误:
However I keep on getting this errors, respectively:
pyspark.sql.utils.AnalysisException: "'write' can not be called on streaming Dataset/DataFrame;
和
java.lang.UnsupportedOperationException: Data source org.apache.spark.sql.cassandra does not support streamed writing.
反正我可以将流式DataFrame发送到我的Cassandra表中吗?
Is there anyway I can send my Streamed DataFrame into a my Cassandra Table?
推荐答案
Spark Cassandra连接器中目前没有Cassandra的流 Sink
.您将需要实施自己的 Sink
或等待它变得可用.
There is currently no streaming Sink
for Cassandra in the Spark Cassandra Connector. You will need to implement your own Sink
or wait for it to become available.
如果您使用的是Scala或Java,则可以使用 foreach
运算符,并使用 ForeachWriter
,如
If you were using Scala or Java you could use foreach
operator and use a ForeachWriter
as described in Using Foreach.
这篇关于如何将流数据集写入Cassandra?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!