数据流中的迭代处理 [英] Iterative processing in Dataflow
问题描述
此处所示.数据流管道由固定的DAG表示.我想知道是否有可能实现一个流水线,在该流水线进行处理,直到根据到目前为止计算的数据满足动态评估的条件为止.
As shown here Dataflow pipelines are represented by a fixed DAG. I'm wondering if it's possible to implement a pipeline where the processing proceeds until a dynamically evaluated condition is satisfied based on the data computed so far.
下面是一些伪代码来说明我要实现的内容:
Here's some pseudo code to illustrate what I'd like to implement:
PCollection pco = null
while(true):
pco = pco.apply(someTransform())
if (conditionSatisfied(pco)):
break
pco.Write()
推荐答案
似乎您真的想要迭代计算.目前,Dataflow尚不为此提供支持,但我们知道这是一个非常重要的用例,我们正在努力寻找正确的API集来表达它.
It seems like you really want iterative computations. Right now Dataflow does not provide support for that, but we are aware that it is a very important use case and we are working on finding the right set of APIs to express it.
目前,您的解决方法是:
For now your workarounds are:
- 反复运行整个管道(运行管道,检查输出,如果不满足条件则再次运行等).这显然具有管道设置和拆卸开销的缺点.
- 通过.apply()无条件循环地构建具有硬编码迭代次数的管道,然后运行整个管道.
- 两者的组合,例如运行固定的5次迭代管道,直到您对结果满意为止.
这篇关于数据流中的迭代处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!