无法从区域"asia-northeast1"的BigQuery数据集中读取数据流 [英] Dataflow can't read from BigQuery dataset in region "asia-northeast1"

查看:37
本文介绍了无法从区域"asia-northeast1"的BigQuery数据集中读取数据流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在新的"asia-northeast1"区域中有一个BigQuery数据集.我正在尝试运行Dataflow模板化管道(在澳大利亚地区运行)以从中读取表.即使数据集/表确实存在,它也会剔除以下错误:

I have a BigQuery dataset located in the new "asia-northeast1" region. I'm trying to run a Dataflow templated pipeline (running in Australia region) to read a table from it. It chucks the following error, even though the dataset/table does indeed exist:

Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
{
  "code" : 404,
  "errors" : [ {
    "domain" : "global",
    "message" : "Not found: Dataset grey-sort-challenge:Konnichiwa_Tokyo",
    "reason" : "notFound"
  } ],
  "message" : "Not found: Dataset grey-sort-challenge:Konnichiwa_Tokyo"
}

我在这里做错什么了吗?

Am I doing something wrong here?

/**
 * BigQuery -> ParDo -> GCS (one file)
 */
public class BigQueryTableToOneFile {
    public static void main(String[] args) throws Exception {
        PipelineOptionsFactory.register(TemplateOptions.class);
        TemplateOptions options = PipelineOptionsFactory
                .fromArgs(args)
                .withValidation()
                .as(TemplateOptions.class);
        options.setAutoscalingAlgorithm(THROUGHPUT_BASED);
        Pipeline pipeline = Pipeline.create(options);
        pipeline.apply(BigQueryIO.read().from(options.getBigQueryTableName()).withoutValidation())
                .apply(ParDo.of(new DoFn<TableRow, String>() {
                    @ProcessElement
                    public void processElement(ProcessContext c) throws Exception {
                        String commaSep = c.element().values()
                                .stream()
                                .map(cell -> cell.toString().trim())
                                .collect(Collectors.joining("\",\""));
                        c.output(commaSep);
                    }
                }))
                .apply(TextIO.write().to(options.getOutputFile())
                        .withoutSharding()
                        .withWritableByteChannelFactory(GZIP)
                );
        pipeline.run();
    }

    public interface TemplateOptions extends DataflowPipelineOptions {
        @Description("The BigQuery table to read from in the format project:dataset.table")
        @Default.String("bigquery-samples:wikipedia_benchmark.Wiki1k")
        ValueProvider<String> getBigQueryTableName();

        void setBigQueryTableName(ValueProvider<String> value);

        @Description("The name of the output file to produce in the format gs://bucket_name/filname.csv")
        @Default.String("gs://bigquery-table-to-one-file/output/bar.csv.gz")
        ValueProvider<String> getOutputFile();

        void setOutputFile(ValueProvider<String> value);
    }
}

Args:

--project=grey-sort-challenge
--runner=DataflowRunner
--jobName=bigquery-table-to-one-file
--maxNumWorkers=1
--zone=australia-southeast1-a
--stagingLocation=gs://bigquery-table-to-one-file/jars
--tempLocation=gs://bigquery-table-to-one-file/tmp
--templateLocation=gs://bigquery-table-to-one-file/template

职位ID:2018-05-05_05_37_08-8260293482986343692

Job id: 2018-05-05_05_37_08-8260293482986343692

推荐答案

对该问题表示抱歉.将在即将发布的Beam SDK 2.5.0中解决(您可以尝试使用Beam回购中的当前头部快照)

Sorry about that issue. It will be addressed in the upcoming Beam SDK 2.5.0 (you can try using current head snapshots from the Beam repo)

这篇关于无法从区域"asia-northeast1"的BigQuery数据集中读取数据流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆