在Spark实时流中刷新数据帧而无需停止进程 [英] Refresh Dataframe in Spark real-time Streaming without stopping process

查看：115 发布时间：2020/8/23 7:17:20 apache-spark amazon-s3 spark-streaming spark-dataframe snappydata

本文介绍了在Spark实时流中刷新数据帧而无需停止进程的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在我的应用程序中，我从Kafka队列中获得了一个帐户流(使用Spark流和kafka)

in my application i get a stream of accounts from Kafka queue (using Spark streaming with kafka)

我需要从S3提取与这些帐户相关的属性，因此我计划缓存S3结果数据帧，因为S3数据目前至少一天不会更新，将来可能很快更改为1小时或10分钟.因此，问题是如何在不停止进程的情况下定期刷新缓存的数据帧.

And i need to fetch attributes related to these accounts from S3 so im planning to cache S3 resultant dataframe as the S3 data will not updated atleast for a day for now, it might change to 1hr or 10 mins very soon in future .So the question is how can i refresh the cached dataframe periodically without stopping process.

**更新:我打算使用SNS和AWS Lambda在S3中有更新时将事件发布到kafka中，而我的流式应用程序将订阅该事件并根据该事件刷新缓存的数据帧(基本上是不持久( )从S3缓存并重新加载) 这是个好方法吗?

**Update:Im planning to publish an event into kafka whenever there is an update in S3, using SNS and AWS lambda and my streaming application will subscribe to the event and refresh the cached dataframe based on this event (basically unpersist()cache and reload from S3) Is this a good approach ?

在Spark实时流中刷新数据帧而无需停止进程 [英] Refresh Dataframe in Spark real-time Streaming without stopping process

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在Spark实时流中刷新数据帧而无需停止进程 [英] Refresh Dataframe in Spark real-time Streaming without stopping process

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭