通配符表与_TABLE_SUFFIX和子查询匹配 [英] Wildcard table matches with _TABLE_SUFFIX and sub-query

查看:158
本文介绍了通配符表与_TABLE_SUFFIX和子查询匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

_TABLE_SUFFIX功能很棒,正是我要解决的问题-但是,当我使用子查询来确定要匹配的表时,它将扫描通配符匹配的所有数据.

如果使用_TABLE_SUFFIX上的一组值执行=或BETWEEN或IN之类的操作,则与仅使用通配符相比,可以看到扫描的数据量下降了:

从test.dataset.*中选择sample_data_TABLE_SUFFIX IN所在的位置("NWD1","NWD2","NWD3","NWD4","NWD5")-已扫描1.8 GB

但是,如果我执行以下操作:

从test.dataset.*中选择sample_data_TABLE_SUFFIX IN(从子集中选择ID)-扫描了50GB(此子选择包含与显式IN子句中显示的值相同的值)

解决方案

_TABLE_SUFFIX 上的常量过滤器将减少查询的数据量,但如果这些过滤器来自动态子查询,则不会减少.>

作为替代方案-您是否考虑过集群存储?

即使使用动态子查询进行过滤,聚簇表也可以优化查询的数据量.

 从test.dataset.*中选择sample_data在哪里clustered_column IN(从子集中选择ID) 

会工作的.

例如

  SELECT MAX(title),Wiki来自`fh-bigquery.wikipedia_v3.pageviews_2018`WHERE DATE(datehour)='2018-01-10'和Wiki IN(选择维基来自`fh-bigquery.wikipedia_v3.pageviews_2018`WHERE DATE(datehour)='2018-01-01'而不是维基,例如"e%"极限3)分组2 

查询0.341 GB,而不是10 GB

The _TABLE_SUFFIX feature is great and exactly what I was looking for to solve my problem - however it is scanning all of the data matched by the wildcard when I use a sub-query to determine which tables to match on.

If you do an operation such as = or BETWEEN or IN with a set of values on _TABLE_SUFFIX, you can see the amount of data being scanned goes down compared to simply a wildcard:

SELECT sample_data FROM `test.dataset.*` WHERE _TABLE_SUFFIX IN ("NWD1","NWD2","NWD3","NWD4","NWD5") - 1.8 GB scanned

However if I do the following:

SELECT sample_data FROM `test.dataset.*` WHERE _TABLE_SUFFIX IN (SELECT ID FROM subset) - 50GB scanned (this sub-select contains the same values as shown in the explicit IN clause)

解决方案

Constant filters on _TABLE_SUFFIX will reduce the amount of data queried, but not if those filters are coming from a dynamic subquery.

As an alternative - have you considered clustered storage?

Clustered tables are able to optimize the amount of data queried, even when filtering with a dynamic subquery.

SELECT sample_data FROM `test.dataset.*`
  WHERE clustered_column IN (SELECT ID FROM subset)

would work.

For example

SELECT MAX(title), wiki
FROM `fh-bigquery.wikipedia_v3.pageviews_2018` 
WHERE DATE(datehour) = '2018-01-10'
AND wiki IN (
  SELECT wiki 
  FROM  `fh-bigquery.wikipedia_v3.pageviews_2018`   
  WHERE  DATE(datehour) = '2018-01-01'
  AND NOT wiki LIKE 'e%'
  LIMIT 3
)
GROUP BY 2 

queries 0.341 GB, instead of 10 GB

这篇关于通配符表与_TABLE_SUFFIX和子查询匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆