BigQuery中的群集工作原理 [英] how clustering works in BigQuery

查看：47 发布时间：2021/5/12 18:31:31 google-bigquery

本文介绍了BigQuery中的群集工作原理的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个表UNITARCHIVE，该表按日期划分，并按UNIT，DUID聚集.

I have a a table UNITARCHIVE partitionned by date, and clustered by UNIT, DUID.

表892 Mb的总大小.

the total size of the table 892 Mb.

当我尝试此查询时

SELECT * FROM `test-187010.ReportingDataset.UNITARCHIVE` WHERE duid="RRSF1" and unit="DUNIT"

Bigquery告诉我，它将处理892 mb，我认为聚类应该减少扫描的大小，我知道我按日期过滤时，大小会大大减少，但我需要整个日期范围.是设计使然还是我做错了事

Bigquery tell me, it will process 892 mb, I thought clustering is supposed to reduce the scanned size, I understand when I filter per date, the size is reduced dramatically, but i need the whole date range. is it by design or am I doing something wrong

推荐答案

要从群集中获得最大收益，每个分区都需要有一定数量的数据.

To get the most benefits out of clustering, each partition needs to have a certain amount of data.

例如，如果集群的最小大小为100MB(由BigQuery在内部决定)，并且每天只有100MB的数据，则查询100天将扫描100 * 100MB，而与集群策略无关.

For example, if the minimum size of a cluster is 100MB (decided internally by BigQuery), and you have only 100MB of data per day, then querying 100 days will scan 100*100MB - regardless of the clustering strategy.

作为这种数据量的替代方法，不是按天分区，而是按年分区.这样一来，每天只需少量的数据就可以从群集中获得最大的好处.

As an alternative with this amount of data, instead of partitioning by day, partition by year. Then you'll get the most benefits out of clustering with a low amount of data per day.

请参阅>按周/年/一个月才能超过分区限制?以获得显示此限制的参考表.

See Partition by week/year/month to get over the partition limit? for a reference table that shows this off.

这篇关于BigQuery中的群集工作原理的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

BigQuery中的群集工作原理 [英] how clustering works in BigQuery

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

BigQuery中的群集工作原理 [英] how clustering works in BigQuery

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭