Apache Beam:以编程方式创建分区表 [英] Apache beam : Programatically create partitioned tables
问题描述
我正在编写一个云数据流,该数据流从Pubsub读取消息并将其存储到BigQuery中.我想使用分区表(按日期),并且正在使用与message关联的Timestamp
来确定消息应该进入哪个分区.下面是我的代码:
I am writing a cloud dataflow that reads messages from Pubsub and stores those into BigQuery. I want to use partitioned table (by date) and I am using Timestamp
associated with message to determine which partition the message should go into. Below is my code:
BigQueryIO.writeTableRows()
.to(new SerializableFunction<ValueInSingleWindow<TableRow>, TableDestination>() {
private static final long serialVersionUID = 1L;
@Override
public TableDestination apply(ValueInSingleWindow<TableRow> value) {
log.info("Row value : {}", value.getValue());
Instant timestamp = value.getTimestamp();
String partition = DateTimeFormat.forPattern("yyyyMMdd").print(timestamp);
TableDestination td = new TableDestination(
"<project>:<dataset>.<table>" + "$" + partition, null);
log.info("Table Destination : {}", td);
return td;
}
})
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withSchema(tableSchema);
部署数据流时,我可以在Stackdriver中看到日志语句,但是,消息没有插入到BigQuery表中,并且出现以下错误:
When I deploy the dataflow, I can see the log statements in Stackdriver, however, the messages do not get inserted into BigQuery tables and I get the following error:
Request failed with code 400, will NOT retry: https://www.googleapis.com/bigquery/v2/projects/<project_id>/datasets/<dataset_id>/tables
severity: "WARNING"
因此,它似乎无法创建表,从而导致插入失败.我是否需要更改数据流定义才能使其正常工作?如果没有,还有其他方法可以通过编程方式创建分区表吗?
So, it looks like it is not able to create a table, resulting in insert failure. Do I need to change the dataflow definition in order to make this work? If not, is there any other way to create the partitioned tables programmatically?
我正在使用Apache Beam 2.0.0.
I am using Apache beam 2.0.0.
推荐答案
这是 a BigQueryIO中的错误,并已在Beam 2.2中修复.您可以使用Beam的快照版本,也可以等到2.2版完成(当前正在执行发布过程).
This was a bug in BigQueryIO and it has been fixed in Beam 2.2. You can use a snapshot version of Beam, or wait until release 2.2 is finalized (the release process is currently in progress).
这篇关于Apache Beam:以编程方式创建分区表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!