从 Dataflow 中的 BigQuery 读取时设置 maximumBillingTier [英] Set maximumBillingTier when reading from BigQuery in Dataflow

查看:23
本文介绍了从 Dataflow 中的 BigQuery 读取时设置 maximumBillingTier的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我从 BigQuery 读取数据作为查询结果时,我正在运行 GCP Dataflow 作业.我正在使用 google-cloud-dataflow-java-sdk-all 版本 1.9.0.设置管道的代码片段如下所示:

I'm running GCP Dataflow job when I'm reading data from BigQuery as a query result. I'm using google-cloud-dataflow-java-sdk-all version 1.9.0. The code fragment that sets up the pipeline looks like this:

PCollection<TableRow> myRows = pipeline.apply(BigQueryIO.Read
            .fromQuery(query)
            .usingStandardSql()
            .withoutResultFlattening()
            .named("Input " + tableId)
    );

查询非常复杂,导致错误消息:

The query is quite complex what results in error message:

查询超出第 1 层的资源限制.需要第 8 层或更高.错误:查询超出第 1 层的资源限制.需要第 8 层或更高.

Query exceeded resource limits for tier 1. Tier 8 or higher required., error: Query exceeded resource limits for tier 1. Tier 8 or higher required.

我想设置 maximumBillingTier,因为它是在 Web UI 或 bq 脚本中完成的.除了为整个项目设置默认值之外,我找不到任何方法,不幸的是,这不是一个选项.

I'd like to set maximumBillingTier as it is done in Web UI or in bq script. I can't find any way to do so except for setting default for the entire project which is unfortunately not an option.

我试图通过这些设置它但没有成功:

I tried to set it through these without success:

  • DataflowPipelineOptions - 它及其扩展的任何接口似乎都没有该设置
  • BigQueryIO.Read.Bound - 我希望它就在 usingStandardSql 和其他类似的旁边,但显然它不在那里
  • JobConfigurationQuery - 这个类有所有很酷的设置,但在设置管道时似乎根本没有使用
  • DataflowPipelineOptions - neither this nor any interface it extends seems to have that setting
  • BigQueryIO.Read.Bound - I would expect it to be there just next to usingStandardSql and others similar but obviously it is not there
  • JobConfigurationQuery - this class has all cool settings but it seems it is not used at all when setting up a pipeline

有没有办法从 Dataflow 作业中传递此设置?

Is there any way to pass this setting from within Dataflow job?

推荐答案

也许 Google 员工会纠正我,但看起来您是对的.我也看不到这个参数暴露出来.我检查了 DataflowBeam API.

Maybe a Googler will correct me, but it looks like you are right. I can't see this parameter exposed either. I checked both the Dataflow and the Beam APIs.

在幕后,Dataflow 使用来自 BigQuery API 的 JobConfigurationQuery,但它根本没有通过自己的 API 公开该参数.

Under the hood, Dataflow is using JobConfigurationQuery from the BigQuery API, but it simply doesn't expose that parameter through its own API.

我看到的一种解决方法是首先直接使用 BigQuery API 运行复杂的查询 - 在放入管道之前.这样您就可以通过 JobConfigurationQuery 类设置最大计费层.将该查询的结果写入 BigQuery 中的另一个表.

One workaround I see is to first run your complex query using the BigQuery API directly - before dropping into your pipeline. That way you can set the max billing tier through the JobConfigurationQuery class. Write the results of that query to another table in BigQuery.

最后,在您的管道中,读取从复杂查询创建的表.

Then finally, in your pipeline, just read in the table which was created from the complex query.

这篇关于从 Dataflow 中的 BigQuery 读取时设置 maximumBillingTier的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆