在bigquery中为_PARTITIONTIME使用子查询不会限制成本 [英] Using subquery for _PARTITIONTIME in bigquery does not limit cost

查看:143
本文介绍了在bigquery中为_PARTITIONTIME使用子查询不会限制成本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我使用标准SQL在BQ上运行以下查询时,它说它将在运行时处理76.6TB

When I run the below query on BQ using standard SQL, it says it will process 76.6TB when running

SELECT 
event_time, user_id, activity_id,dbm_insertion_order_id, dbm_total_media_cost_usd 
FROM `raw.5295.activity_*`
WHERE _PARTITIONTIME >(SELECT * FROM `analytics-dwh.autobidding.activity_list` )
AND timestamp_micros(event_time)  > (SELECT timestamp_micros(MAX(event_time)) from `essence-analytics-dwh.ml_for_autobidding.nest_na_4q18_activity_updated_daily`)
AND _TABLE_SUFFIX IN ('25','20')

analytics-dwh.autobidding.activity_list表只有一列具有唯一的整数列表

The table analytics-dwh.autobidding.activity_list has just one column with a unique list of integers

如果我从上表中删除了子查询,则该查询在运行时使用的内存不足500GB

If I remove the subquery from the above table, the query uses less than 500GB when run

SELECT 
event_time, user_id, activity_id,dbm_insertion_order_id, dbm_total_media_cost_usd 
FROM `raw.5295.activity_*`
WHERE _PARTITIONTIME >TIMESTAMP('2018-12-20')
AND timestamp_micros(event_time)  > (SELECT timestamp_micros(MAX(event_time)) from `essence-analytics-dwh.ml_for_autobidding.nest_na_4q18_activity_updated_daily`)
AND _TABLE_SUFFIX IN ('25','20')

为什么在使用子查询时会发生这种情况?有解决方法吗?

Why does this happen when I use a subquery? Is there a workaround?

推荐答案

为什么使用子查询会发生这种情况?

Why does this happen with use of subquery?

通常,当可以在查询开始时就对过滤器进行评估而无需任何子查询评估或数据扫描时,分区修剪将减少查询成本.

In general, partition pruning will reduce query cost when the filters can be evaluated at the outset of the query without requiring any subquery evaluations or data scans.

您可以了解有关使用伪列查询的限制分区的更多信息

因此,在您的第一个查询(您在其中使用子查询)中-修剪没有发生(它不会根据涉及子查询的条件来限制分区的使用)

So, in your first query (where you use subquery) - the pruning is not happening (It does not limit use of partitions based on the condition that involves the subquery)

在第二个查询中,您使用_PARTITIONTIME >TIMESTAMP('2018-12-20')来限制分区

In second query you use _PARTITIONTIME >TIMESTAMP('2018-12-20') so partitions are limited

底线:不能使用_PARTITIONTIME上包含子查询的过滤器来限制为分区表扫描的分区数.

Bottom line: filters on _PARTITIONTIME that include subqueries can't be used to limit the number of partitions scanned for a partitioned table.

有解决方法吗?

Is there a workaround?

您应该将任务分为两个步骤:使用表analytics-dwh.autobidding.activity_list之外所需的任何逻辑为_PARTITIONTIME计算过滤器,然后使用它(而不是子查询)-使用任何

You should split your task to two steps: calculate filter for _PARTITIONTIME using whatever logic you need off of table analytics-dwh.autobidding.activity_list and then use it (instead of subquery) - using any client of your choice

这篇关于在bigquery中为_PARTITIONTIME使用子查询不会限制成本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆