有没有办法有效地索引包含正则表达式模式的文本列? [英] Is there a way to usefully index a text column containing regex patterns?

查看:178
本文介绍了有没有办法有效地索引包含正则表达式模式的文本列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用PostgreSQL,目前版本为9.2,但我愿意升级。

I'm using PostgreSQL, currently version 9.2 but I'm open to upgrading.

在我的一个表格中,我有一个<$ c类型的列$ c> text 存储正则表达式模式。

In one of my tables, I have a column of type text that stores regex patterns.

CREATE TABLE foo (
    id serial,
    pattern text,
    PRIMARY KEY(id)
);
CREATE INDEX foo_pattern_idx ON foo(pattern);

然后我就这样查询:

INSERT INTO foo (pattern) VALUES ('^abc.*$');

SELECT * FROM foo WHERE 'abc literal string' ~ pattern;

据我所知,这是一种反向 LIKE 或反向模式匹配。如果它是另一种更常见的方式,如果我的干草堆在数据库中,并且我的针被锚定,我可以根据确切的搜索模式和数据或多或少地使用btree索引。

I understand that this is sort of a reverse LIKE or reverse pattern match. If it was the other, more common way, if my haystack was in the database, and my needle was anchored, I could use a btree index more or less effectively depending on the exact search pattern and data.

但我拥有的数据是一个模式表和与模式相关的其他数据。我需要询问数据库哪些行具有与我的查询文本匹配的模式。有没有办法让这比检查我表中每一行的顺序扫描更有效?

But the data that I have is a table of patterns and other data associated with the patterns. I need to ask the database which rows have patterns that match my query text. Is there a way to make this more efficient than a sequential scan that checks every row in my table?

推荐答案

没办法

索引需要 IMMUTABLE 表达式。表达式的结果取决于输入字符串。除了评估每一行的表达式之外,我没有看到任何其他方式,这意味着顺序扫描。

Indexes require IMMUTABLE expressions. The result of your expression depends on the input string. I don't see any other way than to evaluate the expression for every row, meaning a sequential scan.

相关答案以及的更多详细信息IMMUTABLE 角度:

  • Does PostgreSQL support "accent insensitive" collations?

只是您的案例没有解决方法,不可能到指数。索引需要在其元组中存储常量值,这是不可用的,因为每行的结果值是根据输入计算的。并且你不能在不查看列值的情况下转换输入。

Just that there is no workaround for your case, which is impossible to index. The index needs to store constant values in its tuples, which is just not available because the resulting value for every row is computed based on the input. And you cannot transform the input without looking at the column value.

Postgres索引使用绑定到运算符,只绑定表达式 left 可以使用运算符(由于相同的逻辑约束)。更多:

Postgres index usage is bound to operators and only indexes on expressions left of the operator can be used (due to the same logical restraints). More:

  • Can PostgreSQL index array columns?

许多运算符定义 COMMUTATOR 允许查询规划器/优化器将索引表达式翻转到左侧。简单示例: = 的换向器是 = > 的换向器是< ,反之亦然。 文档:

Many operators define a COMMUTATOR which allows the query planner / optimizer to flip the indexed expressions to the left. Simple example: The commutator of = is =. the commutator of > is < and vice versa. The documentation:


索引扫描机制希望看到运算符左侧的索引列。

the index-scan machinery expects to see the indexed column on the left of the operator it is given.

正则表达式匹配运算符 再次没有换向器,因为这是不可能的。亲自看看:

The regular expression match operator ~ has no commutator, again, because that's not possible. See for yourself:

SELECT oprname, oprright::regtype, oprleft::regtype, oprcom
FROM   pg_operator
WHERE  oprname = '~'
AND    'text'::regtype IN (oprright, oprleft);

 oprname | oprright |  oprleft  | oprcom
---------+----------+-----------+------------
 ~       | text     | name      | 0
 ~       | text     | text      | 0
 ~       | text     | character | 0
 ~       | text     | citext    | 0

并参考这里的手册:


oprcom ...此运算符的换向器,如果有的话,
...

未使用的列包含零。例如, oprleft 对于前缀运算符为零。

oprcom ... Commutator of this operator, if any
...
Unused column contain zeroes. For example, oprleft is zero for a prefix operator.

我试过了之前并且不得不接受它不可能在本金上

I have tried before and had to accept it's impossible on principal.

这篇关于有没有办法有效地索引包含正则表达式模式的文本列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆