无法从区域"asia-northeast1"的BigQuery数据集中读取数据流 [英] Dataflow can't read from BigQuery dataset in region "asia-northeast1"
问题描述
我在新的"asia-northeast1"区域中有一个BigQuery数据集.我正在尝试运行Dataflow模板化管道(在澳大利亚地区运行)以从中读取表.即使数据集/表确实存在,它也会剔除以下错误:
I have a BigQuery dataset located in the new "asia-northeast1" region. I'm trying to run a Dataflow templated pipeline (running in Australia region) to read a table from it. It chucks the following error, even though the dataset/table does indeed exist:
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
{
"code" : 404,
"errors" : [ {
"domain" : "global",
"message" : "Not found: Dataset grey-sort-challenge:Konnichiwa_Tokyo",
"reason" : "notFound"
} ],
"message" : "Not found: Dataset grey-sort-challenge:Konnichiwa_Tokyo"
}
我在这里做错什么了吗?
Am I doing something wrong here?
/**
* BigQuery -> ParDo -> GCS (one file)
*/
public class BigQueryTableToOneFile {
public static void main(String[] args) throws Exception {
PipelineOptionsFactory.register(TemplateOptions.class);
TemplateOptions options = PipelineOptionsFactory
.fromArgs(args)
.withValidation()
.as(TemplateOptions.class);
options.setAutoscalingAlgorithm(THROUGHPUT_BASED);
Pipeline pipeline = Pipeline.create(options);
pipeline.apply(BigQueryIO.read().from(options.getBigQueryTableName()).withoutValidation())
.apply(ParDo.of(new DoFn<TableRow, String>() {
@ProcessElement
public void processElement(ProcessContext c) throws Exception {
String commaSep = c.element().values()
.stream()
.map(cell -> cell.toString().trim())
.collect(Collectors.joining("\",\""));
c.output(commaSep);
}
}))
.apply(TextIO.write().to(options.getOutputFile())
.withoutSharding()
.withWritableByteChannelFactory(GZIP)
);
pipeline.run();
}
public interface TemplateOptions extends DataflowPipelineOptions {
@Description("The BigQuery table to read from in the format project:dataset.table")
@Default.String("bigquery-samples:wikipedia_benchmark.Wiki1k")
ValueProvider<String> getBigQueryTableName();
void setBigQueryTableName(ValueProvider<String> value);
@Description("The name of the output file to produce in the format gs://bucket_name/filname.csv")
@Default.String("gs://bigquery-table-to-one-file/output/bar.csv.gz")
ValueProvider<String> getOutputFile();
void setOutputFile(ValueProvider<String> value);
}
}
Args:
--project=grey-sort-challenge
--runner=DataflowRunner
--jobName=bigquery-table-to-one-file
--maxNumWorkers=1
--zone=australia-southeast1-a
--stagingLocation=gs://bigquery-table-to-one-file/jars
--tempLocation=gs://bigquery-table-to-one-file/tmp
--templateLocation=gs://bigquery-table-to-one-file/template
职位ID:2018-05-05_05_37_08-8260293482986343692
Job id: 2018-05-05_05_37_08-8260293482986343692
推荐答案
对该问题表示抱歉.将在即将发布的Beam SDK 2.5.0中解决(您可以尝试使用Beam回购中的当前头部快照)
Sorry about that issue. It will be addressed in the upcoming Beam SDK 2.5.0 (you can try using current head snapshots from the Beam repo)
这篇关于无法从区域"asia-northeast1"的BigQuery数据集中读取数据流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!