Postgres可以在部分索引where子句中使用函数吗? [英] Can Postgres use a function in a partial index where clause?

查看:67
本文介绍了Postgres可以在部分索引where子句中使用函数吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的Postgres表,我想在其中建立部分索引的2列中的1列.是否可以在部分索引的where子句中使用Postgres函数,然后让select查询利用该部分索引?

I have a large Postgres table where I want to partial index on 1 of the 2 columns indexed. Can I and how do I use a Postgres function in the where clause of a partial index and then have the select query utilize that partial index?

示例场景

第一列是杂志",第二列是卷",第三列是问题".所有杂志都可以具有相同的卷"和问题"#,但是我希望索引仅包含该杂志的两个最新卷.这是因为一本杂志可能比其他杂志更老,并且数量比年轻杂志还要高.

First column is "magazine" and the second column is "volume" and the third column is "issue". All the magazines can have same "volume" and "issue" #'s but I want the index to only contain the two most recent volumes for that magazine. This is because a magazine could be older than others and have higher volume numbers than younger magazines.

创建了两个不变的严格函数来确定杂志f_current_volume('gq')和f_previous_volume('gq')的当前和去年的体积.注意:当前/过去的数量#每年仅更改一次.

Two immutable strict functions were created to determine the current and last years volumes for a magazine f_current_volume('gq') and f_previous_volume('gq'). Note: current/past volume # only changes once per year.

我尝试使用函数创建部分索引,但是在查询上使用解释时,它只会对当前的卷杂志进行seq扫描.

I tried creating a partial index with the functions however when using explain on a query it only does a seq scan for a current volume magazine.



CREATE INDEX ix_issue_magazine_volume ON issue USING BTREE ( magazine, volume ) 
  WHERE volume IN (f_current_volume(magazine), f_previous_volume(magazine));

-- Both these do seq scans.
select * from issue where magazine = 'gq' and volume = 100;
select * from issue where magazine = 'gq' and volume = f_current_volume('gq');

要完成这项工作,我做错了什么?并且如果可能的话,为什么Postgres需要使用这种方式来使用索引?

What am I doing wrong to get this work? And if it is possible why does it need to be done that way for Postgres to use the index?



-- UPDATE: 2013-06-17, the following surprisingly used the index.
-- Why would using a field name rather than value allow the index to be used?
select * from issue where magazine = 'gq' and volume = f_current_volume(magazine);

推荐答案

不可移植性和当前"

如果您的 f_current_volume 函数曾经改变其行为-如其名称所暗示,并且存在 f_previous_volume 函数,则数据库可以自由返回完全伪造的结果.

Immutability and 'current'

If your f_current_volume function ever changes its behaviour - as is implied by its name, and the presence of an f_previous_volume function, then the database is free to return completely bogus results.

PostgreSQL会拒绝您创建索引,因为您只能使用 IMMUTABLE 函数.事实是,根据文档标记功能 IMMUTABLE 意味着告诉 PostgreSQL 有关该功能的行为.您说的是我保证此函数的结果不会更改,请随时在此基础上进行假设."

PostgreSQL would've refused to let you create the index, complaining that you can only use IMMUTABLE functions. The thing is, marking a function IMMUTABLE means that you are telling PostgreSQL something about the function's behaviour, as per the documentation. You're saying "I promise this function's results won't change, feel free to make assumptions on that basis."

做出的最大假设之一是建立索引时.如果函数针对多次调用的不同输入返回不同的输出,则结果为 splat .或者,如果不幸的话,可能是 boom .从理论上讲,您可以通过 REINDEX 更改所有内容来摆脱不可变的函数,但是唯一真正安全的方法是 DROP 使用它的每个索引, DROP 函数,使用新定义重新创建该函数,然后重新创建索引.

One of the biggest assumptions made is when building an index. If the function returns different outputs for different inputs on multiple invocations, things go splat. Or possibly boom if you're unlucky. In theory you can kind-of get away with changing an immutable function by REINDEXing everything, but the only really safe way is to DROP every index that uses it, DROP the function, re-create the function with its new definition and re-create the indexes.

如果您很少更改某些东西,但实际上在不同的时间点确实有两个不同的不可变函数,而它们恰好具有相同的名称,那么这样做确实非常有用.

That can actually be really useful to do if you have something that changes only infrequently, but you really have two different immutable functions at different points in time that just happen to have the same name.

PostgreSQL的部分索引匹配非常愚蠢-但是,正如我在为此编写测试用例时发现的那样,它比以前聪明得多.它忽略一个哑的 OR true .它将 WHERE(a%100 = 0或a%1000 = 0)上的索引用于 WHERE a = 100 的查询.它甚至具有不可内嵌的身份功能:

PostgreSQL's partial index matching is pretty dumb - but, as I found when writing test cases for this, a lot smarter than it used to be. It ignores a dummy OR true. It uses an index on WHERE (a%100=0 OR a%1000=0) for a WHERE a = 100 query. It even got it with a non-inline-able identity function:

regress=> CREATE TABLE partial AS SELECT x AS a, x 
          AS b FROM generate_series(1,10000) x;
regress=> CREATE OR REPLACE FUNCTION identity(integer) 
          RETURNS integer AS $$
          SELECT $1; 
          $$ LANGUAGE sql IMMUTABLE STRICT;
regress=> CREATE INDEX partial_b_fn_idx 
          ON partial(b) WHERE (identity(b) % 1000 = 0);
regress=> EXPLAIN SELECT b FROM partial WHERE b % 1000 = 0;
                                      QUERY PLAN                                       
---------------------------------------------------------------------------------------
 Index Only Scan using partial_b_fn_idx on partial  (cost=0.00..13.05 rows=50 width=4)
(1 row)

但是,它无法证明 IN 子句是否匹配,例如:

However, it was unable to prove the IN clause match, eg:

regress=> DROP INDEX partial_b_fn_idx;
regress=> CREATE INDEX partial_b_fn_in_idx ON partial(b)
          WHERE (b IN (identity(b), 1));
regress=> EXPLAIN SELECT b FROM partial WHERE b % 1000 = 0;
                               QUERY PLAN                                 
----------------------------------------------------------------------------
 Seq Scan on partial  (cost=10000000000.00..10000000195.00 rows=50 width=4)

那么我的建议是?将 IN 重写为 OR 列表:

So my advice? Rewrite IN as an OR list:

CREATE INDEX ix_issue_magazine_volume ON issue USING BTREE ( magazine, volume ) 
  WHERE (volume = f_current_volume(magazine) OR volume = f_previous_volume(magazine));

...,并且在当前版本中,只要您牢记上面概述的不变性规则,它就可能起作用.好吧,第二个版本:

... and on a current version it might just work, so long as you keep the immutability rules outlined above in mind. Well, the second version:

select * from issue where magazine = 'gq' and volume = f_current_volume('gq');

可能.更新:不,不会.要使用它,Pg必须识别出 magazine ='gq',并意识到 f_current_volume('gq')因此等同于 f_current_volume(magazine).它不会尝试通过部分索引匹配证明该级别的等效性,因此,如您在更新中所指出的,您必须直接编写 f_current_volume(magazine).我应该已经发现了.从理论上讲,如果计划者足够聪明,PostgreSQL可以在第二个查询中使用索引,但是我不确定您如何有效地寻找这样的替换值得的地方.

might. Update: No, it won't; for it to be used, Pg would have to recognise that magazine='gq' and realise that f_current_volume('gq') was therefore equiavalent to f_current_volume(magazine). It doesn't attempt to prove equivalences on that level with partial index matching, so as you've noted in your update you have to write f_current_volume(magazine) directly. I should've spotted that. In theory PostgreSQL could use the index with the second query if the planner was smart enough, but I'm not sure how you'd go about efficiently looking for places where a substitution like this would be worthwhile.

第一个示例 volume = 100 将永远不会使用索引,因为在查询计划时PostgreSQL不知道 f_current_volumne('gg'); 将评估为 100 .您可以在部分索引 WHERE 子句中添加OR子句 OR volume = 100 ,然后PostgreSQL会解决这个问题.

The first example, volume = 100 will never use the index, since at query planning time PostgreSQL has no idea that f_current_volumne('gg'); will evaluate to 100. You could add an OR clause OR volume = 100 to your partial index WHERE clause and PostgreSQL would figure it out then, though.

这篇关于Postgres可以在部分索引where子句中使用函数吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆