BigQueryIO使用withTemplateCompatibility读取性能 [英] BigQueryIO Read performance using withTemplateCompatibility

查看：99 发布时间：2020/9/3 5:04:31 google-cloud-dataflow apache-beam

本文介绍了BigQueryIO使用withTemplateCompatibility读取性能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Apache Beam 2.1.0有一个从BigQuery读取的模板管道错误，这意味着它们只能执行一次.此处的更多详细信息 https://issues.apache.org/jira/browse/BEAM-2058

Apache Beam 2.1.0 had a bug with template pipelines that read from BigQuery which meant they could only be executed once. More details here https://issues.apache.org/jira/browse/BEAM-2058

Beam 2.2.0发行版已修复此问题，现在您可以使用 withTemplateCompatibility 选项从BigQuery中读取，您的模板管道现在可以多次运行.

This has been fixed with the release of Beam 2.2.0, you can now read from BigQuery using the withTemplateCompatibility option, your template pipeline can now be run multiple times.

  pipeline
    .apply("Read rows from table."
         , BigQueryIO.readTableRows()
                     .withTemplateCompatibility()
                     .from("<your-table>")
                     .withoutValidation())

此实现似乎给BigQueryIO读取操作带来了巨大的性能成本，我现在拥有在 8-11分钟中运行的批处理管道，现在持续花费 45-50分钟 >完成.这两个管道之间的唯一区别是 .withTemplateCompatibility().

This implementation seems to come with a huge performance cost to BigQueryIO read operation, I now have batch pipelines what ran in 8-11 minutes now consistently taking 45-50 minutes to complete. The only difference between both pipelines is the .withTemplateCompatibility().

我正试图了解性能大幅下降的原因以及是否有任何方法可以改善它们.

Am trying to understand the reasons for the huge drop in performance and if there is any way to improve them.

谢谢.

解决方案:基于jkff的输入.

Solution: based on jkff's input.

  pipeline
    .apply("Read rows from table."
         , BigQueryIO.readTableRows()
                     .withTemplateCompatibility()
                     .from("<your-table>")
                     .withoutValidation())
    .apply("Reshuffle",  Reshuffle.viaRandomKey())

BigQueryIO使用withTemplateCompatibility读取性能 [英] BigQueryIO Read performance using withTemplateCompatibility

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

BigQueryIO使用withTemplateCompatibility读取性能 [英] BigQueryIO Read performance using withTemplateCompatibility

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭