PostgreSQL:为什么此查询不使用我的索引? [英] PostgreSQL: Why is this query not using my index?

查看:601
本文介绍了PostgreSQL:为什么此查询不使用我的索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在数据库上执行此查询(所有数字和列名均已组成):

I'm performing this query on a database (all the numbers and column names are made up):

select * from t where a=1 and b=11 and c!=5 and d<8

t有一个索引:

create index i on t (a,b,c,d)

当我运行"EXPLAIN ANALYZE"时,查询将执行顺序扫描,大约需要55毫秒才能完成此操作.如果我这样修改查询:

When I run "EXPLAIN ANALYZE", the query performs a sequential scan and takes approximately 55ms to do this. If I modify the query like this:

select * from t where a=1 and b=11 and c=5 and d<8
                                       ^

它使用索引并在0.5毫秒内完成.所以它一定是NOT EQUALS,对吧?并非如此,因为如果我执行此查询:

It uses the index and completes in 0.5 ms. So it must be the NOT EQUALS, right? Not so, because if I do this query:

select * from t where a=1 and b=11 and c=5 and d!=8
                                               ^

查询仍使用索引.但是,如果我尝试这样做,则没有索引:

The query still uses the index. But if I try this, no index:

select * from t where a=1 and b=11 and c<5 and d<8
                                       ^

那么为什么Postgres表现得如此?这对我来说很奇怪.

So why is Postgres behaving the way it is? This is very strange to me.

推荐答案

您已经意识到,问题与使用除equals之外的其他运算符有关.索引只能最有效地用于与等号(加上一个范围条件)进行比较的最左边的列.

As you already realized, the problem is related to using operators other than equals. An index can only be used most efficiently for the leftmost columns that are compared with by equals (plus one range condition).

在您的示例中:

create index i on t (a,b,c,d);
where a=1 and b=11 and c!=5 and d<8;

它只能有效地将索引用于ab.这意味着数据库将获取所有符合ab条件的行,然后针对其余条件检查每一行.

It can use the index only for a and b efficiently. That means the DB fetches all rows matching the a and b condition and then checks each row against the remaining conditions.

当您将c上的过滤器更改为相等时,它将(潜在地)获取较少的行(仅匹配abc的那些行),然后针对过滤器.在这种情况下,使用索引会更有效.

When you change the filter on c to equals, it fetches (potentially) less rows (only those matching a and b and c) and then checks those (fewer) rows against the d filter. Using the index is more efficient in this case.

通常,PostgreSQL查询计划程序会评估两个选项:(1)使用索引; (2)做一个SeqScan.对于两者,它都计算成本值-成本值越高,预期的性能就越差.因此,它采用的是成本值较小的产品.这是它决定是否使用索引的方式,没有固定的阈值.

In general, the PostgreSQL query planner evaluates both options: (1) using the index; (2) doing a SeqScan. For both, it calculates a cost value — the higher it is the worse is the expected performance. Consequently, it takes the one with the smaller cost value. This is how it decides to use the index or not, there is no fixed threshold.

最后,在上面写出加一范围条件".这意味着,如果您使用等号,它不仅可以以最有效的方式使用索引,而且还可以用于一个范围内的条件.

Finally, is wrote "plus one range condition" above. That means that it can not only use the index in the most efficient way if you are using equals signs, but also for one single range condition.

考虑到查询中只有一个范围条件,建议更改索引:

Considering that you have one single range condition in your query, I'd suggest to change the index like this:

create index i on t (a,b,d,c);

现在,它可以有效地将abd上的过滤器与索引一起使用,并且只需要过滤掉c!=5处的行.尽管此索引可以像原始索引一样更有效地用于您的查询,但这并不意味着PG会自动使用它.这取决于成本估算.但请尝试一下.

Now it can use the the filters on a and b and d efficiently with the index and only needs to filter the rows away where c!=5. Although this index can be used more efficiently for your query as your original one, it doesn't automatically mean PG will use it. It depends on the cost estimates. But give it a try.

最后,如果 不够快,并且您在表达式c!=5中使用的值5是常量,则可以考虑使用部分索引:

Finally, if this isn't fast enought and the value 5 you are using in the expression c!=5 is constant, you might consider a partial index:

 create index i on t (a,b,d)
        where c!=5;

如果您将它们与之比较的值都是常量,那么您也可以对所有其他列执行此操作.

You could do that with all other columns too, if the values you compare them against are constants.

参考:

  • Indexing >, < and BETWEEN
  • Indexing multiple independent range conditions (not!)

这篇关于PostgreSQL:为什么此查询不使用我的索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆