BigQuery Cluster 字段用法/值不清楚 [英] BigQuery Cluster field usage/value not clear

查看:18
本文介绍了BigQuery Cluster 字段用法/值不清楚的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个带有集群文件的表,但我没有看到任何保存或任何性能改进,这就是我所做的:

I created a table with a cluster filed but I don't see any saving or any performance improvement, this is what I have done:

我创建了一个包含 3 列的目标表:projectId、tableId 和 schema使用此 SQL:

I created a destination table with 3 columns: projectId, tableId and schema using this SQL:

SELECT projectId, tableId, schema 
FROM `project.dataset.tables` 
WHERE _partitionTime >= '2018-12-27 00:00:00'

分区字段:默认 partitionTime集群字段:projectId, tableId

Partition Field: Default partitionTime Cluster Field: projectId, tableId

这个sql的原始成本是:$2.82

The original cost of this sql is: $2.82

现在当我从新表中选择时

Now When selecting from the new table I expect

  1. 降低成本
  2. 为了获得更好的性能

我正在使用这个 SQL

I'm using this SQL

SELECT * FROM `project.table.testCluster` 
WHERE  projectId = 'xxx' and tableId = 'yyy' 
AND _PARTITIONTIME >= TIMESTAMP("2018-12-30") LIMIT 1000

从我的基准测试和 BigQuery 控制台执行报告中我都看不到

From my benchmark and from BigQuery console execution report I see neither

知道为什么吗?

推荐答案

BigQuery 根据聚簇列中的值对聚簇表中的数据进行排序,并将它们组织成块.当您提交包含聚簇列过滤器的查询时,BigQuery 会使用聚簇信息有效地确定块是否包含与查询相关的任何数据.

BigQuery sorts the data in a clustered table based on the values in the clustering columns and organizes them into blocks. When you submit a query that contains a filter on a clustered column, BigQuery uses the clustering information to efficiently determine whether a block contains any data relevant to the query.

这允许 BigQuery 只扫描相关块——这个过程被称为块修剪.

This allows BigQuery to only scan the relevant blocks — a process referred to as block pruning.

这里有一个小问题.BigQuery 会在运行查询之前估算每个查询将查询的数据量.在没有聚类的情况下,所述估计是准确的.对于聚类,估计是一个上限,并且查询最终可能会查询较少或可能保持不变.这取决于聚集列的结构.聚集列中的唯一值越高,优化越低.

One small catch here. BigQuery provides an estimate for how much data each query will query before running the query. Without clustering, said estimate is exact. With clustering the estimate is an upper bound, and the query might end up querying less or may remain the same. It depends on the structure of the clustered column. The higher the unique values in the clustered column, lower the optimization.

这篇关于BigQuery Cluster 字段用法/值不清楚的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆