如何将流数据集写入Cassandra? [英] How to write streaming Dataset to Cassandra?

查看:97
本文介绍了如何将流数据集写入Cassandra?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有一个基于Python流的DataFrame df ,其中包含我想使用

So I have a Python Stream-sourced DataFrame df that has all the data I want to place into a Cassandra table with the spark-cassandra-connector. I've tried doing this in two ways:

df.write \
    .format("org.apache.spark.sql.cassandra") \
    .mode('append') \
    .options(table="myTable",keyspace="myKeySpace") \
    .save() 

query = df.writeStream \
    .format("org.apache.spark.sql.cassandra") \
    .outputMode('append') \
    .options(table="myTable",keyspace="myKeySpace") \
    .start()

query.awaitTermination()

但是我继续分别遇到此错误:

However I keep on getting this errors, respectively:

pyspark.sql.utils.AnalysisException: "'write' can not be called on streaming Dataset/DataFrame;

java.lang.UnsupportedOperationException: Data source org.apache.spark.sql.cassandra does not support streamed writing.

反正我可以将流式DataFrame发送到我的Cassandra表中吗?

Is there anyway I can send my Streamed DataFrame into a my Cassandra Table?

推荐答案

Spark Cassandra连接器中目前没有Cassandra的流 Sink .您将需要实施自己的 Sink 或等待它变得可用.

There is currently no streaming Sink for Cassandra in the Spark Cassandra Connector. You will need to implement your own Sink or wait for it to become available.

如果您使用的是Scala或Java,则可以使用 foreach 运算符,并使用 ForeachWriter ,如

If you were using Scala or Java you could use foreach operator and use a ForeachWriter as described in Using Foreach.

这篇关于如何将流数据集写入Cassandra?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆