使用 withTemplateCompatibility 的 BigQueryIO 读取性能 [英] BigQueryIO Read performance using withTemplateCompatibility

查看：26 发布时间：2021/11/11 22:34:08 google-cloud-dataflow apache-beam

本文介绍了使用 withTemplateCompatibility 的 BigQueryIO 读取性能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Apache Beam 2.1.0 在从 BigQuery 读取的模板管道中存在一个错误，这意味着它们只能执行一次.更多详情请见 https://issues.apache.org/jira/browse/BEAM-2058

Apache Beam 2.1.0 had a bug with template pipelines that read from BigQuery which meant they could only be executed once. More details here https://issues.apache.org/jira/browse/BEAM-2058

此问题已在 Beam 2.2.0 的发布中得到修复，您现在可以使用 withTemplateCompatibility 选项从 BigQuery 中读取数据，您的模板管道现在可以多次运行.

This has been fixed with the release of Beam 2.2.0, you can now read from BigQuery using the withTemplateCompatibility option, your template pipeline can now be run multiple times.

  pipeline
    .apply("Read rows from table."
         , BigQueryIO.readTableRows()
                     .withTemplateCompatibility()
                     .from("<your-table>")
                     .withoutValidation())

这个实现似乎给 BigQueryIO 读取操作带来了巨大的性能成本，我现在有批处理管道，运行时间为 8-11 分钟，现在持续运行 45-50 分钟> 完成.两个管道之间的唯一区别是 .withTemplateCompatibility().

This implementation seems to come with a huge performance cost to BigQueryIO read operation, I now have batch pipelines what ran in 8-11 minutes now consistently taking 45-50 minutes to complete. The only difference between both pipelines is the .withTemplateCompatibility().

我正在尝试了解性能大幅下降的原因，以及是否有任何改进方法.

Am trying to understand the reasons for the huge drop in performance and if there is any way to improve them.

谢谢.

解决方案:基于 jkff 的输入.

Solution: based on jkff's input.

  pipeline
    .apply("Read rows from table."
         , BigQueryIO.readTableRows()
                     .withTemplateCompatibility()
                     .from("<your-table>")
                     .withoutValidation())
    .apply("Reshuffle",  Reshuffle.viaRandomKey())

使用 withTemplateCompatibility 的 BigQueryIO 读取性能 [英] BigQueryIO Read performance using withTemplateCompatibility

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 withTemplateCompatibility 的 BigQueryIO 读取性能 [英] BigQueryIO Read performance using withTemplateCompatibility

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭