使现有的 bigquery 表聚簇 [英] Make existing bigquery table clustered

查看:26
本文介绍了使现有的 bigquery 表聚簇的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 bigquery 中有一个相当大的现有分区表.我想让表聚簇,至少对于新分区.

I have a quite huge existing partitioned table in bigquery. I want to make the table clustered, at least for the new partition.

来自文档:https://cloud.google.com/bigquery/docs/creating-clustered-tables,据说我们能够在加载数据时创建一个聚簇表,我尝试使用聚簇字段加载一个新分区:<代码>job_config.clustering_fields = ["event_type"].

From the documentation: https://cloud.google.com/bigquery/docs/creating-clustered-tables, it is said that we are able to Creating a clustered table when you load data and I have tried to load a new partition using clustering fields: job_config.clustering_fields = ["event_type"].

加载成功完成,但是新分区似乎没有聚集(我不确定如何检查它是否聚集,但是当我查询该特定分区时,它总是会扫描所有行).

The load finished successfully, however it seems that the new partition is not clustered (I am not really sure how to check whether it is clustered or not, but when I query to that particular partition it would always scan all rows).

有没有一种好的方法可以为现有的分区表创建集群字段?

Is there a good way to make clustering field for an existing partitioned table?

非常感谢任何评论、建议或回答.

Any comment, suggestion, or answer is well appreciated.

非常感谢,约书亚

推荐答案

只能在创建表时指定聚簇列
所以,显然你不能指望现有的非集群表,尤其是新分区成为集群

You can only specify clustering columns when a table is created
So, obviously you cannot expect existing non-clustered table and especially just new partitions to become clustered

解决方法"是创建新表以进行正确分区/集群,并将数据从 Google Cloud Storage (GCS) 加载到其中.为此,您可以先将原始表中的数据导出到 GCS 中,这样整个过程将是免费的

The "workaround" is to create new table to be properly partitioned / clustered and load data into it from Google Cloud Storage (GCS). You can export data from original table into GCS first for this so whole process will be free of charge

这篇关于使现有的 bigquery 表聚簇的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆