在数据流中将BigQuery联合表读取为源会引发错误 [英] Reading BigQuery federated table as source in Dataflow throws an error

查看:36
本文介绍了在数据流中将BigQuery联合表读取为源会引发错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在BigQuery中有一个联合来源,它指向GCS中的一些CSV文件.

I have a federated source in BigQuery which is pointing to some CSV files in GCS.

当我尝试读取联邦BigQuery表作为数据流管道的源时,它会引发以下错误:

When I try to read to the federated BigQuery table as a source for a Dataflow pipeline, it throws the following error:

    1226 [main] ERROR com.google.cloud.dataflow.sdk.util.BigQueryTableRowIterator  - Error reading from BigQuery table Federated_test_dataflow of dataset CPT_7414_PLAYGROUND : 400 Bad Request
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "Cannot list a table of type EXTERNAL.",
    "reason" : "invalid"
  } ],
  "message" : "Cannot list a table of type EXTERNAL."
}

Dataflow是否不支持BigQuery中的联合源,还是我做错了什么?我确实知道我可以将GCS中的文件直接读取到我的管道中,但是由于应用程序的设计,我宁愿使用BigQuery TableRow 对象.

Does Dataflow not support federated sources in BigQuery, or am I doing something wrong? I do know that I could read the files from GCS directly into my pipeline, but I'd prefer to work with BigQuery TableRow objects instead due to the design of the application.

 PCollection<TableRow> results = pipeline.apply("fed-test", BigQueryIO.Read.from("<project_id>:CPT_7414_PLAYGROUND.Federated_test_dataflow")).apply(ParDo.of(new DoFn<TableRow, TableRow>() {
        @Override
        public void processElement(ProcessContext c) throws Exception {
            System.out.println(c.element());
        }
    }));

推荐答案

正如迈克尔所说,BigQuery不支持直接从EXTERNAL(联合表)或VIEW中读取:即使有效读取也需要查询.

As Michael says, BigQuery does not support directly reading from EXTERNAL (federated tables) or VIEWs: even reading effectively takes a query.

要从Dataflow中读取这些表,可以改用

To read from these tables in Dataflow, you can instead use

BigQueryIO.Read.fromQuery("SELECT * FROM table_or_view_name")

将发出查询并将结果保存到临时表中,然后开始读取过程.当然,这会产生在BigQuery上查询的费用,因此,如果您希望重复从同一VIEW或EXTERNAL表中读取数据,则可能需要手动创建该表.

which will issue the query and save the result to a temporary table, and then begin the read process. Of course, this will incur the costs of querying on BigQuery, so if you wish to read from the same VIEW or EXTERNAL table repeatedly you may want to manually create the table.

这篇关于在数据流中将BigQuery联合表读取为源会引发错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆