没有可用于分区消除的过滤器,无法查询表 [英] Cannot query over table without a filter that can be used for partition elimination
问题描述
我有一个分区表,很想使用MERGE语句,但是由于某些原因无法解决问题.
I have a partitioned table and would love to use a MERGE statement, but for some reason doesn't work out.
MERGE `wr_live.p_email_event` t
using `wr_live.email_event` s
on t.user_id=s.user_id and t.event=s.event and t.timestamp=s.timestamp
WHEN NOT MATCHED THEN
INSERT (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
values (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
我知道
在没有过滤器的情况下无法查询表"wr_live.p_email_event" 可用于消除分区.
Cannot query over table 'wr_live.p_email_event' without a filter that can be used for partition elimination.
什么是正确的语法?还有一种方法可以表达较短的插入内容吗?没有命名所有列?
What's the proper syntax? Also is there a way I can express shorter the insert stuff? without naming all columns?
推荐答案
什么是正确的语法?
What's the proper syntax?
从错误消息中可以看到-已分区wr_live.p_email_event
表是在require partition filter
设置为true的情况下创建的.这意味着对该表的任何查询都必须在相应的分区字段
As you can see from error message - your partitioned wr_live.p_email_event
table was created with require partition filter
set to true. This mean that any query over this table must have some filter on respective partitioned field
假设timestamp
是该分区字段-您可以执行以下操作
Assuming that timestamp
IS that partitioned field - you can do something like below
MERGE `wr_live.p_email_event` t
USING `wr_live.email_event` s
ON t.user_id=s.user_id AND t.event=s.event AND t.timestamp=s.timestamp
AND DATE(t.timestamp) > CURRENT_DATE() -- this is the filter you should tune
WHEN NOT MATCHED THEN
INSERT (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
VALUES (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
因此,您需要做以下一行,以便实际上不会过滤掉您需要涉及的任何内容
So you need to make below line such that it in reality does not filter out whatever you need to be involved
AND DATE(t.timestamp) <> CURRENT_DATE() -- this is the filter you should tune
例如,我发现将其设置为将来的时间戳记-在许多情况下都可以解决该问题,例如
For example, I found, setting it to timestamp in future - in many cases addresses the issue, like
AND DATE(t.timestamp) > DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
当然,如果您的wr_live.email_event
表也已将require partition filter
设置为true进行分区-您需要为s.timestamp添加相同的过滤器
Of course, if your wr_live.email_event
table also partitioned with require partition filter
set to true - you need to add same filter for s.timestamp
还有一种方法可以表达较短的插入内容吗?没有命名所有列?
Also is there a way I can express shorter the insert stuff? without naming all columns?
BigQuery DML的INSERT需要column names to be specified
-无法(至少据我所知)使用INSERT语句来避免它
同时,您可以通过在查询结果中使用DDL的CREATE TABLE来避免这种情况.不需要列出列
BigQuery DML's INSERT requires column names to be specified
- there is no way (at least that I am aware of) to avoid it using INSERT statement
Meantime, you can avoid this by using DDL's CREATE TABLE from the result of the query. This will not require listing the columns
例如,下面的内容
CREATE OR REPLACE TABLE `wr_live.p_email_event`
PARTITION BY DATE(timestamp) AS
SELECT * FROM `wr_live.p_email_event`
WHERE DATE(timestamp) <> DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
UNION ALL
SELECT * FROM `wr_live.email_event` s
WHERE NOT EXISTS (
SELECT 1 FROM `wr_live.p_email_event` t
WHERE t.user_id=s.user_id AND t.event=s.event AND t.timestamp=s.timestamp
AND DATE(t.timestamp) > DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
)
您可能还希望通过OPTIONS()包括表选项列表-但看起来还不支持filter属性-因此,如果确实有/需要它-上面将擦除"此属性:o(
You might also want to include table options list via OPTIONS() - but looks like filter attribute is not supported yet - so if you do have/need it - above will "erase" this attribute :o(
这篇关于没有可用于分区消除的过滤器,无法查询表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!