PostgreSQL是否支持“重音不敏感”？排序规则？ [英] Does PostgreSQL support "accent insensitive" collations?

查看：139 发布时间：2018/8/2 12:48:18 sql postgresql localization indexing pattern-matching

本文介绍了PostgreSQL是否支持“重音不敏感”？排序规则？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在Microsoft SQL Server中，可以指定重音不敏感排序规则（对于数据库，表或列），这意味着可以进行查询，例如

In Microsoft SQL Server, it's possible to specify an "accent insensitive" collation (for a database, table or column), which means that it's possible for a query like

SELECT * FROM users WHERE name LIKE 'João'

找到一个 Joao 名称的行。

我知道可以使用PostgreSQL中的字符串去除重音符号 unaccent_string contrib函数，但我想知道PostgreSQL是否支持这些accent insensitive排序规则，所以<$ c上面的$ c> SELECT 可行。

I know that it's possible to strip accents from strings in PostgreSQL using the unaccent_string contrib function, but I'm wondering if PostgreSQL supports these "accent insensitive" collations so the SELECT above would work.

索引

使用索引这种查询，创建一个表达式索引。但是，Postgres只接受索引的 IMMUTABLE 函数。如果函数可以为同一输入返回不同的结果，则索引可能会静默中断。

Index

To use an index for that kind of query, create an index on the expression. However, Postgres only accepts IMMUTABLE functions for indexes. If a function can return a different result for the same input, the index could silently break.

不幸的是， unaccent（）仅 STABLE ，而不是 IMMUTABLE 。根据这个线程在pgsql-bugs上，这是由于三个的原因：

Unfortunately, unaccent() is only STABLE, not IMMUTABLE. According to this thread on pgsql-bugs, this is due to three reasons:

这取决于字典的行为。

此字典没有硬连线连接。

因此它还取决于当前的 search_path ，可以轻松更改。

It depends on the behavior of a dictionary.
There is no hard-wired connection to this dictionary.
It therefore also depends on the current search_path, which can change easily.

一些教程指示只是将函数波动率改为 IMMUTABLE 。这种强力方法在某些条件下会破裂。

Some tutorials on the web instruct to just alter the function volatility to IMMUTABLE. This brute-force method can break under certain conditions.

其他人建议简单 IMMUTABLE 包装函数（如我过去做过自己。）

Others suggest a simple IMMUTABLE wrapper function (like I did myself in the past).

目前还在争论是否要制作带有两个参数的变体 IMMUTABLE 明确声明使用的字典。阅读此处或这里。

There is an ongoing debate whether to make the variant with two parameters IMMUTABLE which declares the used dictionary explicitly. Read here or here.

另一个替代方案是这个模块带有 IMMUTABLE unaccent（）由Musicbrainz提供的功能，在Github上提供。没有自己测试过。我想我已经提出 更好的主意 ：

Another alternative would be this module with an IMMUTABLE unaccent() function by Musicbrainz, provided on Github. Haven't tested it myself. I think I have come up with a better idea:

我建议一种方法至少与其他解决方案一样有效，但更安全：
使用双参数形式创建包装函数并且 hard-wire函数和字典的模式：

I propose an approach that is at least as efficient as other solutions floating around, but safer: Create a wrapper function with the two-parameter form and "hard-wire" the schema for function and dictionary:

CREATE OR REPLACE FUNCTION f_unaccent(text)
  RETURNS text AS
$func$
SELECT public.unaccent('public.unaccent', $1)  -- schema-qualify function and dictionary
$func$  LANGUAGE sql IMMUTABLE;

public 作为架构的地方您安装了扩展程序（ public 是默认设置）。

public being the schema where you installed the extension (public is the default).

以前，我添加了 SET search_path = public，pg_temp 到函数 - 直到我发现字典也可以是模式限定的，目前（第10页）未记录。在我的第9.5行和第10页的测试中，这个版本有点短，大约快两倍。

Previously, I had added SET search_path = public, pg_temp to the function - until I discovered that the dictionary can be schema-qualified, too, which is currently (pg 10) not documented. This version is a bit shorter and around twice as fast in my tests on pg 9.5 and pg 10.

更新后的版本仍然不允许函数内联因为声明的函数 IMMUTABLE 可能无法调用身体中的非不可变函数允许这样做。在我们使用 表达式索引<时，对性能几乎无关紧要/ a> 在此 IMMUTABLE 函数：

The updated version still doesn't allow function inlining because functions declared IMMUTABLE may not call non-immutable functions in the body to allow that. Hardly matters for performance while we make use of an expression index on this IMMUTABLE function:

CREATE INDEX users_unaccent_name_idx ON users(f_unaccent(name));

调整查询以匹配索引（以便查询计划员可以使用它）：

Adapt your queries to match the index (so the query planner can use it):

SELECT * FROM users
WHERE  f_unaccent(name) = f_unaccent('João');

您不需要右表达式中的函数。您可以直接提供非重音字符串，例如'Joao'。

You don't need the function in the right expression. You can supply unaccented strings like 'Joao' directly.

在Postgres 9.5或更早中，必须手动扩展Œ或ß等连字符（如果需要），因为 unaccent（）总是替换单字母：

In Postgres 9.5 or older ligatures like 'Œ' or 'ß' have to be expanded manually (if you need that), since unaccent() always substitutes a single letter:

SELECT unaccent('Œ Æ œ æ ß');

unaccent
----------
E A e a S

你会喜欢此更新为unaccent 9.6 ：

扩展 contrib / unaccent 的标准 unaccent.rules 文件来处理Unicode已知的所有
变音符号，并且正确扩展连字（Thomas
Munro，LéonardBenedetti）

Extend contrib/unaccent's standard unaccent.rules file to handle all diacritics known to Unicode, and expand ligatures correctly (Thomas Munro, Léonard Benedetti)

大胆强调我的。现在我们得到：

Bold emphasis mine. Now we get:

SELECT unaccent('Œ Æ œ æ ß');

unaccent
----------
OE AE oe ae ss

模式匹配

LIKE 或具有任意模式的 ILIKE ，将其与模块 pg_trgm 。创建一个三元组GIN（通常更可取）或GIST表达式索引。 GIN示例：

Pattern matching

For LIKE or ILIKE with arbitrary patterns, combine this with the module pg_trgm in PostgreSQL 9.1 or later. Create a trigram GIN (typically preferable) or GIST expression index. Example for GIN:

CREATE INDEX users_unaccent_name_trgm_idx ON users
USING gin (f_unaccent(name) gin_trgm_ops);

可用于以下查询：

SELECT * FROM users
WHERE  f_unaccent(name) LIKE ('%' || f_unaccent('João') || '%');

GIN和GIST索引的维护成本比普通btree贵：

GIN and GIST indexes are more expensive to maintain than plain btree:

区别GiST和GIN索引

Difference between GiST and GIN index

对于左锚定模式，有更简单的解决方案。有关模式匹配和性能的更多信息：

There are simpler solutions for just left-anchored patterns. More about pattern matching and performance:

模式匹配与LIKE，SIMILAR TO或PostgreSQL中的正则表达式

Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

pg_trgm 还提供有用的运营商的相似性（％）和距离（< - > ）。

pg_trgm also provides useful operators for "similarity" (%) and "distance" (<->).

Trigram索引还支持带有〜等的简单正则表达式。和不区分大小写模式匹配 ILIKE ：

Trigram indexes also support simple regular expressions with ~ et al. and case insensitive pattern matching with ILIKE:

PostgreSQL重音+不区分大小写的搜索

PostgreSQL accent + case insensitive search

这篇关于PostgreSQL是否支持“重音不敏感”？排序规则？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

PostgreSQL是否支持“重音不敏感”？排序规则？ [英] Does PostgreSQL support "accent insensitive" collations?

问题描述

推荐答案

索引

Index

模式匹配

Pattern matching

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

PostgreSQL是否支持“重音不敏感”？排序规则？ [英] Does PostgreSQL support &quot;accent insensitive&quot; collations?

问题描述

推荐答案

索引

Index

模式匹配

Pattern matching

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

PostgreSQL是否支持“重音不敏感”？排序规则？ [英] Does PostgreSQL support "accent insensitive" collations?

登录关闭