没有可用于分区消除的过滤器,无法查询表 [英] Cannot query over table without a filter that can be used for partition elimination

查看:106
本文介绍了没有可用于分区消除的过滤器,无法查询表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个分区表,很想使用MERGE语句,但是由于某些原因无法解决问题.

I have a partitioned table and would love to use a MERGE statement, but for some reason doesn't work out.

MERGE `wr_live.p_email_event`  t
using `wr_live.email_event` s
on t.user_id=s.user_id and t.event=s.event and t.timestamp=s.timestamp
WHEN NOT MATCHED THEN
INSERT (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
values (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)

我知道

在没有过滤器的情况下无法查询表"wr_live.p_email_event" 可用于消除分区.

Cannot query over table 'wr_live.p_email_event' without a filter that can be used for partition elimination.

什么是正确的语法?还有一种方法可以表达较短的插入内容吗?没有命名所有列?

What's the proper syntax? Also is there a way I can express shorter the insert stuff? without naming all columns?

推荐答案

什么是正确的语法?

What's the proper syntax?

从错误消息中可以看到-已分区wr_live.p_email_event表是在require partition filter设置为true的情况下创建的.这意味着对该表的任何查询都必须在相应的分区字段

As you can see from error message - your partitioned wr_live.p_email_event table was created with require partition filter set to true. This mean that any query over this table must have some filter on respective partitioned field

假设timestamp是该分区字段-您可以执行以下操作

Assuming that timestamp IS that partitioned field - you can do something like below

MERGE `wr_live.p_email_event`  t
USING `wr_live.email_event` s
ON t.user_id=s.user_id AND t.event=s.event AND t.timestamp=s.timestamp
AND DATE(t.timestamp) > CURRENT_DATE()  -- this is the filter you should tune 
WHEN NOT MATCHED THEN
INSERT (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
VALUES (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)   

因此,您需要做以下一行,以便实际上不会过滤掉您需要涉及的任何内容

So you need to make below line such that it in reality does not filter out whatever you need to be involved

AND DATE(t.timestamp) <> CURRENT_DATE()  -- this is the filter you should tune 

例如,我发现将其设置为将来的时间戳记-在许多情况下都可以解决该问题,例如

For example, I found, setting it to timestamp in future - in many cases addresses the issue, like

AND DATE(t.timestamp) > DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)   

当然,如果您的wr_live.email_event表也已将require partition filter设置为true进行分区-您需要为s.timestamp添加相同的过滤器

Of course, if your wr_live.email_event table also partitioned with require partition filter set to true - you need to add same filter for s.timestamp

还有一种方法可以表达较短的插入内容吗?没有命名所有列?

Also is there a way I can express shorter the insert stuff? without naming all columns?

BigQuery DML的INSERT需要column names to be specified-无法(至少据我所知)使用INSERT语句来避免它
同时,您可以通过在查询结果中使用DDL的CREATE TABLE来避免这种情况.不需要列出列

BigQuery DML's INSERT requires column names to be specified - there is no way (at least that I am aware of) to avoid it using INSERT statement
Meantime, you can avoid this by using DDL's CREATE TABLE from the result of the query. This will not require listing the columns

例如,下面的内容

CREATE OR REPLACE TABLE `wr_live.p_email_event`
PARTITION BY DATE(timestamp) AS
SELECT * FROM `wr_live.p_email_event` 
WHERE DATE(timestamp) <> DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
UNION ALL 
SELECT * FROM `wr_live.email_event` s
WHERE NOT EXISTS (
  SELECT 1 FROM `wr_live.p_email_event` t
  WHERE t.user_id=s.user_id AND t.event=s.event AND t.timestamp=s.timestamp
  AND DATE(t.timestamp) > DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
)

您可能还希望通过OPTIONS()包括表选项列表-但看起来还不支持filter属性-因此,如果确实有/需要它-上面将擦除"此属性:o(

You might also want to include table options list via OPTIONS() - but looks like filter attribute is not supported yet - so if you do have/need it - above will "erase" this attribute :o(

这篇关于没有可用于分区消除的过滤器,无法查询表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆