unaccent()阻止Postgres中的索引使用 [英] unaccent() preventing index usage in Postgres

查看:388
本文介绍了unaccent()阻止Postgres中的索引使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从导入PostgreSQL 9.3.5的OpenStreetMap数据库中检索具有给定名称的方法,操作系统是Win7 64位。为了有点容错,我使用了Postgres的unaccent扩展。

I want to retrieve a way with a given name from an OpenStreetMap database imported into PostgreSQL 9.3.5, the OS is Win7 64-bit. In order to be a bit failure tolerant, I use the unaccent extension of Postgres.

我的查询如下:

SELECT * FROM germany.ways
WHERE lower(tags->'name') like lower(unaccent('unaccent','Weststrasse'))

查询计划:

Seq Scan on ways  (cost=0.00..2958579.31 rows=122 width=465)
  Filter: (lower((tags -> 'name'::text)) ~~ lower(unaccent('unaccent'::regdictionary, 'Weststrasse'::text)))

奇怪的是这个查询使用顺序扫描方式,尽管 lower(tags->'name')上存在索引:

The strange thing is that this query uses a sequential scan on ways, although an index is present on lower(tags->'name'):

CREATE INDEX ways_tags_name ON germany.ways (lower(tags -> 'name'));

一旦我从查询中删除unaccent,Postgres就会使用索引:

Postgres uses the index as soon as I remove unaccent from the query:

SELECT * FROM germany.ways
WHERE lower(tags->'name') like lower('Weststrasse')

查询计划:

Index Scan using ways_tags_name on ways  (cost=0.57..495.43 rows=122 width=465)
  Index Cond: (lower((tags -> 'name'::text)) = 'weststrasse'::text)
  Filter: (lower((tags -> 'name'::text)) ~~ 'weststrasse'::text)

为什么无人防止Postgres使用索引?在我看来,这没有意义,因为在执行实际查询之前,应该已经完全知道不相似(变音符号删除等)的结果。所以Postgres应该能够使用索引。使用unaccent时如何避免seq扫描?

Why is unaccent preventing Postgres from using the index? In my opinion this doesn't make sense because the result of unaccent (diacritics removal, etc.) should already be completely known before the actual query is executed. So Postgres should be able to use the index. How can the seq scan be avoided when using unaccent?

推荐答案

unaccent()的IMMUTABLE变体



澄清当前接受的中的错误信息,不正确回答

表达式索引只允许 IMMUTABLE 函数(出于显而易见的原因)和 unaccent() STABLE 。您在评论中建议的解决方案也存在问题。详细解释和正确解决方案

IMMUTABLE variant of unaccent()

To clarify the misinformation in the currently accepted, incorrect answer:
Expression indexes only allow IMMUTABLE functions (for obvious reasons) and unaccent() is only STABLE. The solution you suggested in the the comment is also problematic. Detailed explanation and a proper solution for that:

  • Does PostgreSQL support "accent insensitive" collations?

取决于标签的内容 - >名称 unaccent()添加到表达式索引可能很有用,但这与索引未被使用的问题正交:

Depending on the content of tags->name it may be useful to add unaccent() to the expression index, but that's orthogonal to the question why the index wasn't being used:

  • PostgreSQL accent + case insensitive search

您的查询中的运算符 LIKE 巧妙地错误(最有可能)。您想要将'Weststrasse'解释为搜索模式,您希望按原样匹配(规范化的)字符串。替换为 = 运算符,您将看到(当前索引) irregardless 的(位图)索引扫描函数易变性 unaccent()

The operator LIKE in your query is subtly wrong (most likely). You do not want to interpret 'Weststrasse' as search pattern, you want to match the (normalized) string as is. Replace with the = operator, and you will see a (bitmap) index scan with your current index, irregardless of the function volatility of unaccent():

SELECT * FROM germany.ways
WHERE lower(tags->'name') = lower(unaccent('unaccent','Weststrasse'))



为什么?



LIKE 的右操作数是图案。 Postgres不能使用普通的btree索引进行模式匹配(例外适用) 。可以使用btree索引上的相等性检查来优化带有普通字符串作为模式的 LIKE (无特殊字符)。但是如果字符串中有特殊字符,则索引已经用完。

Why?

The right operand of LIKE is a pattern. Postgres cannot use a plain btree index for pattern matching (exceptions apply). A LIKE with a plain string as pattern (no special characters) can be optimized with an equality check on the btree index. But if there are special characters in the string, this index is out.

如果有 IMMUTABLE 函数在 LIKE 的右边,可以立即评估它,并且仍然可以进行所述优化。每关于功能波动率类别的文档

If there is an IMMUTABLE function to the right of LIKE, it can be evaluated immediately and the said optimisation is still possible. Per documentation on Function Volatility Categories:


IMMUTABLE ...

此类别允许优化器在
查询使用常量参数调用它时预先计算函数。

IMMUTABLE ...
This category allows the optimizer to pre-evaluate the function when a query calls it with constant arguments.

同样不可能具有较小的函数波动率( STABLE VOLATILE )。这就是为什么假装 IMMUTABLE unaccent()的解决方案似乎有效,但它确实在口红上涂上了口红。

The same is not possible with a lesser function volatility (STABLE or VOLATILE). That's why your "solution" of faking an IMMUTABLE unaccent() seemed to work, but it's really putting lipstick on a pig.

重申:


  • 如果你想使用 LIKE 和模式,使用 trigram index

  • 如果您不想使用 LIKE 和模式,请使用等于运算符 =

  • If you want to work with LIKE and patterns, use a trigram index.
  • If you don't want to work with LIKE and patterns, use the equality operator =

这篇关于unaccent()阻止Postgres中的索引使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆