通配符表与_TABLE_SUFFIX和子查询匹配 [英] Wildcard table matches with _TABLE_SUFFIX and sub-query
问题描述
_TABLE_SUFFIX功能很棒,正是我要解决的问题-但是,当我使用子查询来确定要匹配的表时,它将扫描通配符匹配的所有数据.
如果使用_TABLE_SUFFIX上的一组值执行=或BETWEEN或IN之类的操作,则与仅使用通配符相比,可以看到扫描的数据量下降了:
从test.dataset.*中选择sample_data_TABLE_SUFFIX IN所在的位置("NWD1","NWD2","NWD3","NWD4","NWD5")
-已扫描1.8 GB
但是,如果我执行以下操作:
从test.dataset.*中选择sample_data_TABLE_SUFFIX IN(从子集中选择ID)
-扫描了50GB(此子选择包含与显式IN子句中显示的值相同的值)
_TABLE_SUFFIX
上的常量过滤器将减少查询的数据量,但如果这些过滤器来自动态子查询,则不会减少.>
作为替代方案-您是否考虑过集群存储?
即使使用动态子查询进行过滤,聚簇表也可以优化查询的数据量.
从test.dataset.*中选择sample_data在哪里clustered_column IN(从子集中选择ID)
会工作的.
例如
SELECT MAX(title),Wiki来自`fh-bigquery.wikipedia_v3.pageviews_2018`WHERE DATE(datehour)='2018-01-10'和Wiki IN(选择维基来自`fh-bigquery.wikipedia_v3.pageviews_2018`WHERE DATE(datehour)='2018-01-01'而不是维基,例如"e%"极限3)分组2
查询0.341 GB,而不是10 GB
The _TABLE_SUFFIX feature is great and exactly what I was looking for to solve my problem - however it is scanning all of the data matched by the wildcard when I use a sub-query to determine which tables to match on.
If you do an operation such as = or BETWEEN or IN with a set of values on _TABLE_SUFFIX, you can see the amount of data being scanned goes down compared to simply a wildcard:
SELECT sample_data FROM `test.dataset.*`
WHERE _TABLE_SUFFIX IN ("NWD1","NWD2","NWD3","NWD4","NWD5")
- 1.8 GB scanned
However if I do the following:
SELECT sample_data FROM `test.dataset.*`
WHERE _TABLE_SUFFIX IN (SELECT ID FROM subset)
- 50GB scanned (this sub-select contains the same values as shown in the explicit IN clause)
Constant filters on _TABLE_SUFFIX
will reduce the amount of data queried, but not if those filters are coming from a dynamic subquery.
As an alternative - have you considered clustered storage?
Clustered tables are able to optimize the amount of data queried, even when filtering with a dynamic subquery.
SELECT sample_data FROM `test.dataset.*`
WHERE clustered_column IN (SELECT ID FROM subset)
would work.
For example
SELECT MAX(title), wiki
FROM `fh-bigquery.wikipedia_v3.pageviews_2018`
WHERE DATE(datehour) = '2018-01-10'
AND wiki IN (
SELECT wiki
FROM `fh-bigquery.wikipedia_v3.pageviews_2018`
WHERE DATE(datehour) = '2018-01-01'
AND NOT wiki LIKE 'e%'
LIMIT 3
)
GROUP BY 2
queries 0.341 GB, instead of 10 GB
这篇关于通配符表与_TABLE_SUFFIX和子查询匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!