BigQuery中的群集工作原理 [英] how clustering works in BigQuery

查看:47
本文介绍了BigQuery中的群集工作原理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个表UNITARCHIVE,该表按日期划分,并按UNIT,DUID聚集.

I have a a table UNITARCHIVE partitionned by date, and clustered by UNIT, DUID.

表892 Mb的总大小.

the total size of the table 892 Mb.

当我尝试此查询时

SELECT * FROM `test-187010.ReportingDataset.UNITARCHIVE` WHERE duid="RRSF1" and unit="DUNIT"

Bigquery告诉我,它将处理892 mb,我认为聚类应该减少扫描的大小,我知道我按日期过滤时,大小会大大减少,但我需要整个日期范围.是设计使然还是我做错了事

Bigquery tell me, it will process 892 mb, I thought clustering is supposed to reduce the scanned size, I understand when I filter per date, the size is reduced dramatically, but i need the whole date range. is it by design or am I doing something wrong

推荐答案

要从群集中获得最大收益,每个分区都需要有一定数量的数据.

To get the most benefits out of clustering, each partition needs to have a certain amount of data.

例如,如果集群的最小大小为100MB(由BigQuery在内部决定),并且每天只有100MB的数据,则查询100天将扫描100 * 100MB,而与集群策略无关.

For example, if the minimum size of a cluster is 100MB (decided internally by BigQuery), and you have only 100MB of data per day, then querying 100 days will scan 100*100MB - regardless of the clustering strategy.

作为这种数据量的替代方法,不是按天分区,而是按年分区.这样一来,每天只需少量的数据就可以从群集中获得最大的好处.

As an alternative with this amount of data, instead of partitioning by day, partition by year. Then you'll get the most benefits out of clustering with a low amount of data per day.

请参阅>按周/年/一个月才能超过分区限制?以获得显示此限制的参考表.

See Partition by week/year/month to get over the partition limit? for a reference table that shows this off.

这篇关于BigQuery中的群集工作原理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆