BigQuery分区表插入了过去的数据 [英] BigQuery Partitioned tables insert data from the past

查看:69
本文介绍了BigQuery分区表插入了过去的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们要开始在BQ中使用"分区表" 但是文档( https://cloud.google.com/bigquery/docs/partitioned-tables )表示只能使用"流插入"

We want to start using "Partitioned tables" in BQ But documentation(https://cloud.google.com/bigquery/docs/partitioned-tables) says that using "Streaming inserts" possible only

如果分区值过去最多为7天,则分区中最多为3天 未来,

if the partitioning value is up to 7 days in the past, up to 3 days in the future,

在我们的例子中,我们有一些数据可能在过去7天以上具有q个分区值.

In our case, we have some data which could have q partition value more than 7 days in the past.

我们通过BigQuery REST API保存数据

We save data via BigQuery REST api

这是否意味着我们不能使用分区表,还是有其他解决方法?如何为分区表保存超出范围(7天3天)的数据?

Does it mean that we can't use the partition tables or is there some other workaround this? How to save the data which is out of bounds(7days3days) for a partition table?

通常,这个想法是:我们有一个包含流数据的表(每分钟约100条记录),我们想将数据直接流到分区,然后使用分区进行分析查询

In general, the idea is: we have a table with streaming data(~100 records per min) and we want to stream data directly to partitions and then use the partitions for analytical queries

推荐答案

总结注释线程:

  • 无法流传输到[过去7天,未来3天]之后的分区.这是与性能相关的限制,团队正在努力消除它.

  • It's not possible to stream to partitions beyond [7 days in the past, 3 days in the future]. This is a performance related limitation, and the team is working to remove it.

解决方法:将数据流式传输到未分区的表,然后从该表插入已分区的表.

Workaround: Stream your data to a non-partitioned table, and from there insert into the partitioned one.

这篇关于BigQuery分区表插入了过去的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆