使现有的bigquery表集群 [英] Make existing bigquery table clustered

查看:51
本文介绍了使现有的bigquery表集群的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在bigquery中有一个相当大的现有分区表.我想使表集群化,至少对于新分区而言.

I have a quite huge existing partitioned table in bigquery. I want to make the table clustered, at least for the new partition.

摘自文档: https://cloud.google.com/bigquery/docs/creating-clustered-tables ,据说我们能够在加载数据时创建聚簇表,并且我尝试使用聚簇字段来加载新分区: job_config.clustering_fields = ["event_type"] .

From the documentation: https://cloud.google.com/bigquery/docs/creating-clustered-tables, it is said that we are able to Creating a clustered table when you load data and I have tried to load a new partition using clustering fields: job_config.clustering_fields = ["event_type"].

加载成功完成,但是新分区似乎没有群集(我不确定如何检查它是否群集,但是当我查询该特定分区时,它将始终扫描所有行).

The load finished successfully, however it seems that the new partition is not clustered (I am not really sure how to check whether it is clustered or not, but when I query to that particular partition it would always scan all rows).

是否有一种很好的方法可以为现有分区表创建聚类字段?

Is there a good way to make clustering field for an existing partitioned table?

任何评论,建议或答案都很好.

Any comment, suggestion, or answer is well appreciated.

非常感谢,优苏阿

推荐答案

您只能在创建表时指定聚簇列
因此,显然,您不能期望现有的非集群表,尤其是新分区成为集群

You can only specify clustering columns when a table is created
So, obviously you cannot expect existing non-clustered table and especially just new partitions to become clustered

解决方法"是创建要正确分区/集群的新表,然后将数据从Google Cloud Storage(GCS)加载到该表中.为此,您可以先将原始表中的数据导出到GCS中,这样整个过程都是免费的

The "workaround" is to create new table to be properly partitioned / clustered and load data into it from Google Cloud Storage (GCS). You can export data from original table into GCS first for this so whole process will be free of charge

这篇关于使现有的bigquery表集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆