如果没有可用于分区消除的过滤器,则无法查询表 [英] Cannot query over table without a filter that can be used for partition elimination

查看:22
本文介绍了如果没有可用于分区消除的过滤器,则无法查询表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个分区表,很想使用 MERGE 语句,但由于某种原因不起作用.

I have a partitioned table and would love to use a MERGE statement, but for some reason doesn't work out.

MERGE `wr_live.p_email_event`  t
using `wr_live.email_event` s
on t.user_id=s.user_id and t.event=s.event and t.timestamp=s.timestamp
WHEN NOT MATCHED THEN
INSERT (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
values (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)

我明白了

不能在没有过滤器的情况下查询表 'wr_live.p_email_event'可用于分区消除.

Cannot query over table 'wr_live.p_email_event' without a filter that can be used for partition elimination.

正确的语法是什么?还有一种方法可以表达更短的插入内容吗?不命名所有列?

What's the proper syntax? Also is there a way I can express shorter the insert stuff? without naming all columns?

推荐答案

正确的语法是什么?

What's the proper syntax?

正如您从错误消息中看到的那样 - 您的分区 wr_live.p_email_event 表是在 require partition filter 设置为 true 的情况下创建的.这意味着对该表的任何查询都必须对各自的分区字段进行过滤

As you can see from error message - your partitioned wr_live.p_email_event table was created with require partition filter set to true. This mean that any query over this table must have some filter on respective partitioned field

假设 timestamp 是分区字段 - 您可以执行以下操作

Assuming that timestamp IS that partitioned field - you can do something like below

MERGE `wr_live.p_email_event`  t
USING `wr_live.email_event` s
ON t.user_id=s.user_id AND t.event=s.event AND t.timestamp=s.timestamp
AND DATE(t.timestamp) > CURRENT_DATE()  -- this is the filter you should tune 
WHEN NOT MATCHED THEN
INSERT (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
VALUES (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)   

因此,您需要在以下行中进行设置,以便它实际上不会过滤掉您需要参与的任何内容

So you need to make below line such that it in reality does not filter out whatever you need to be involved

AND DATE(t.timestamp) <> CURRENT_DATE()  -- this is the filter you should tune 

例如,我发现,将其设置为将来的时间戳 - 在许多情况下可以解决该问题,例如

For example, I found, setting it to timestamp in future - in many cases addresses the issue, like

AND DATE(t.timestamp) > DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)   

当然,如果您的 wr_live.email_event 表也分区,require partition filter 设置为 true - 您需要为 s.timestamp 添加相同的过滤器

Of course, if your wr_live.email_event table also partitioned with require partition filter set to true - you need to add same filter for s.timestamp

还有一种方法可以表达更短的插入内容吗?不命名所有列?

Also is there a way I can express shorter the insert stuff? without naming all columns?

BigQuery DML 的 INSERT 需要指定列名 - 没有办法(至少我知道)使用 INSERT 语句避免它
同时,您可以通过从查询结果中使用 DDL 的 CREATE TABLE 来避免这种情况.这不需要列出列

BigQuery DML's INSERT requires column names to be specified - there is no way (at least that I am aware of) to avoid it using INSERT statement
Meantime, you can avoid this by using DDL's CREATE TABLE from the result of the query. This will not require listing the columns

例如,类似于下面的内容

For example, something like below

CREATE OR REPLACE TABLE `wr_live.p_email_event`
PARTITION BY DATE(timestamp) AS
SELECT * FROM `wr_live.p_email_event` 
WHERE DATE(timestamp) <> DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
UNION ALL 
SELECT * FROM `wr_live.email_event` s
WHERE NOT EXISTS (
  SELECT 1 FROM `wr_live.p_email_event` t
  WHERE t.user_id=s.user_id AND t.event=s.event AND t.timestamp=s.timestamp
  AND DATE(t.timestamp) > DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
)

您可能还想通过 OPTIONS() 包含表选项列表 - 但看起来过滤器属性尚不受支持 - 所以如果您有/需要它 - 上面将擦除"此属性 :o(

You might also want to include table options list via OPTIONS() - but looks like filter attribute is not supported yet - so if you do have/need it - above will "erase" this attribute :o(

这篇关于如果没有可用于分区消除的过滤器,则无法查询表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆