我可以以编程方式配置 PostgreSQL 以不消除全文搜索中的停用词吗? [英] Can I configure PostgreSQL programmatically to not eliminate stop words in full-text search?

查看:104
本文介绍了我可以以编程方式配置 PostgreSQL 以不消除全文搜索中的停用词吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 PostgreSQL 全文搜索来搜索一个项目,在该项目中传统的停用词('a'、'the'、'if' 等)应该被索引和搜索,这不是默认行为.例如,我可能希望我的用户找到查询to be or not to be"的结果.

I'm using PostgreSQL full text search for a project where traditional stop words ('a', 'the', 'if' etc.) should be indexed and searchable, which is not the default behaviour. For example, I might want my users to find results for the query 'to be or not to be'.

文档 表明我可以通过创建$SHAREDIR/tsearch_data/english.stop 中的空停用词字典(例如),但这会使部署复杂化;我希望能够使用 SQL 配置 PostgreSQL 的停用词处理.这可能吗?如果是这样,您能否提供示例 SQL 语句?

The documentation indicates that I could achieve this by creating an empty stopwords dictionary in $SHAREDIR/tsearch_data/english.stop (for example), but this will complicate deployment; I want to be able to configure PostgreSQL's stop word handling with SQL. Is this possible? If so, can you provide a sample SQL statement?

推荐答案

根据您对上一个答案的评论,您可以在使用无停用词全部停用词<之间轻松切换/强>.您可以使用自定义搜索配置来实现这一点:

As per your comment on the previous answer, you can easily switch between using no stop words and all stop words. You can acheive this with a custom search configuration:

(1) 可以不用停用词文件创建自定义词典,例如:

(1) You can create a custom dictionary without using the stop words file, for example:

CREATE TEXT SEARCH DICTIONARY english_stem_nostop (
    Template = snowball
    , Language = english
);

注意,在上面我省略了 StopWords 参数.

Note, in the above I left out the StopWords parameter.

(2) 然后创建一个新的配置来使用你的新字典:

(2) Then create a new configuration to use your new dictionary:

CREATE TEXT SEARCH CONFIGURATION public.english_nostop ( COPY = pg_catalog.english );
ALTER TEXT SEARCH CONFIGURATION public.english_nostop
   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, hword, hword_part, word WITH english_stem_nostop;

(3) 然后,在搜索时指定您要使用的配置(或者您可以每次更改default_text_search_config 选项),例如:

(3) Then, when searching specify the config you want use (alternatively you can change the default_text_search_config option each time), eg:

SELECT
    title
FROM
    articles
WHERE
    to_tsvector('english_nostop', COALESCE(title,'') || ' ' || COALESCE(body,''))
    @@ to_tsquery('english_nostop', 'how & to');

您可以在上述 SQL 中仅指定 'english' 以使用普通配置.

You can specify just 'english' in the above SQL to use the normal config.

注意,在这个例子中,使用标准配置会导致通知,因为只有停用词.

Note, in this example that using the standard configuration will result in notices because there are only stop words.

但是请记住以下几点:

  • 如果您使用索引,则需要两个 - 每个配置一个.(请参阅这些文档:tsearch 表触发器).
  • 按照上面的第 2 步仔细检查您要使用此映射的解析器标记(请参阅 解析器).
  • If you are using indexes, you will need two - one for each configuration. (see these docs: tsearch tables and triggers).
  • Double check which parser tokens you want to use this mapping as per step #2, above (see Parsers).

这篇关于我可以以编程方式配置 PostgreSQL 以不消除全文搜索中的停用词吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆