在更新数据流管道上强制更新SideInput [英] Force update of SideInput on updating Dataflow pipeline

查看:59
本文介绍了在更新数据流管道上强制更新SideInput的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个Dataflow管道,该管道会提取活动租户(存储在GCS中)的配置,并将其作为sideInput馈入 ActiveTenantFilter .配置很少更新,因此为什么我决定在每次更新时使用-update 标志重新部署管道.

I have a Dataflow pipeline running that fetches a configuration of active tenants (stored in GCS) and feeds it into an ActiveTenantFilter as a sideInput. The configuration is rarely updated, hence why I decided to re-deploy the pipeline, using the --update flag, whenever it is updated.

但是,使用更新标志时,不会再次获取文件,即保持状态.每当重新部署管道时,是否可以强制更新此 PCollectionView ?

However, when using the update flag, the file is not fetched again, i.e., the state is maintained. Is it possible to enforce that this PCollectionView is updated whenever the pipeline is re-deployed?

推荐答案

您是正确的,当您对管道进行-update 时,它将处理新数据,但不会重新加载旧数据.听起来您想要的是缓慢更新侧面输入不幸的是尚未实施.相反,您可以尝试排空并重新启动管道.

You are correct, when you --update a pipeline it will process new data but will not re-load old data. It sounds like what you want is slowly updating side inputs which unfortunately has not been implemented yet. You could instead try draining and re-starting your pipeline.

这篇关于在更新数据流管道上强制更新SideInput的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆