优化 Apache Beam/DataFlow 中的重复转换 [英] Optimizing repeated transformations in Apache Beam/DataFlow

查看：39 发布时间：2021/11/11 22:34:29 google-cloud-dataflow apache-beam

本文介绍了优化 Apache Beam/DataFlow 中的重复转换的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想知道 Apache Beam.Google DataFlow 是否足够智能，可以识别数据流图中的重复转换并只运行一次.例如，如果我有 2 个分支:

I wonder if Apache Beam.Google DataFlow is smart enough to recognize repeated transformations in the dataflow graph and run them only once. For example, if I have 2 branches:

p |GroupByKey() |FlatMap(...)
p |combiners.Top.PerKey(...) |FlatMap(...)

两者都涉及在引擎盖下按键对元素进行分组.执行引擎是否会识别 GroupByKey() 在两种情况下具有相同的输入并且只运行一次?或者我是否需要手动确保 GroupByKey() 在这种情况下继续使用它的所有分支?

both will involve grouping elements by key under the hood. Will the execution engine recognize that GroupByKey() has the same input in both cases and run it only once? Or do I need to manually ensure that GroupByKey() in this case proceeds all branches where it gets used?

优化 Apache Beam/DataFlow 中的重复转换 [英] Optimizing repeated transformations in Apache Beam/DataFlow

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

优化 Apache Beam/DataFlow 中的重复转换 [英] Optimizing repeated transformations in Apache Beam/DataFlow

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭