如何在Spark结构化流中使用流数据帧更新静态数据帧 [英] How to update a Static Dataframe with Streaming Dataframe in Spark structured streaming

查看：133 发布时间：2020/9/4 8:30:53 apache-spark apache-spark-sql spark-structured-streaming

本文介绍了如何在Spark结构化流中使用流数据帧更新静态数据帧的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我有一个静态DataFrame，具有以下几百万行.

I have a Static DataFrame with millions of rows as follows.

静态DataFrame:

--------------
id|time_stamp|
--------------
|1|1540527851|
|2|1540525602|
|3|1530529187|
|4|1520529185|
|5|1510529182|
|6|1578945709|
--------------

现在每批处理中都会形成一个流DataFrame，其中包含ID和经过以下类似操作后更新的time_stamp.

Now in every batch, a Streaming DataFrame is being formed which contains id and updated time_stamp after some operations like below.

第一批:

--------------
id|time_stamp|
--------------
|1|1540527888|
|2|1540525999|
|3|1530529784|
--------------

现在每一批，我都想用Streaming Dataframe的更新值来更新Static DataFrame，如下所示. 该怎么做?

Now in every batch, I want to update the Static DataFrame with the updated values of Streaming Dataframe like follows. How to do that?

第一批之后的静态DF:

Static DF after first batch :

--------------
id|time_stamp|
--------------
|1|1540527888|
|2|1540525999|
|3|1530529784|
|4|1520529185|
|5|1510529182|
|6|1578945709|
--------------

我已经尝试过 except()，union()或'left_anti'连接.但是看来结构化流式传输不支持此类操作.

I've already tried except(), union() or 'left_anti' join. But it seems structured streaming doesn't support such operations.