如何在 Postgresql 中用 SQL 查询带有标点符号的单词? [英] How do I SQL query for words with punctuation in Postgresql?

查看:121
本文介绍了如何在 Postgresql 中用 SQL 查询带有标点符号的单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我在数据库中存储了这样的字符串/短语:

If I have strings/phrases like this stored in the database:

  • 什么是 Q 型操作?
  • 程序员指南
  • A.B.C 的编码

有没有办法像 "Programmers""abc""q-type" 一样传递查询参数并有它找到 "Programmer's", "ABC""Q-type"?

Is there a way to pass a query parameter in like "Programmers" or "abc" or "q-type" and have it find "Programmer's", "A.B.C" and "Q-type"?

推荐答案

tsvector

使用 tsvector 类型,它是 PostgreSQL 文本搜索功能的一部分.

tsvector

Use the tsvector type, which is part of the PostgreSQL text-search feature.

postgres> select 'What are Q-type Operations?'::tsvector;
              tsvector               
-------------------------------------
 'Operations?' 'Q-type' 'What' 'are'
(1 row)

您也可以在 tsvectors 上使用熟悉的运算符:

You can use familiar operators on tsvectors as well:

postgres> select 'What are Q-type Operations?'::tsvector
postgres>        || 'A.B.C''s of Coding'::tsvector;
                           ?column?                           
--------------------------------------------------------------
 'A.B.C''s' 'Coding' 'Operations?' 'Q-type' 'What' 'are' 'of'

来自 tsvector 文档:

tsvector 值是不同词素的排序列表,这些词是经过规范化以合并同一词的不同变体的词(有关详细信息,请参阅第 12 章).输入时自动进行排序和去重

A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see Chapter 12 for details). Sorting and duplicate-elimination are done automatically during input

如果您还想进行特定于语言的规范化,例如删除常用词('the'、'a' 等)和乘法,请使用 to_tsvector 函数.它还为文本搜索的不同单词分配权重:

If you also want to do language-specific normalization, like removing common words ('the', 'a', etc) and multiplies, use the to_tsvector function. It also assigns weights to different words for text search:

postgres> select to_tsvector('english',
postgres> 'What are Q-type Operations? A.B.C''s of Coding');
                      to_tsvector                       
--------------------------------------------------------
 'a.b.c':7 'code':10 'oper':6 'q':4 'q-type':3 'type':5
(1 row)

全面的文本搜索

显然,对查询中的每一行执行此操作会很昂贵——因此您应该将 tsvector 存储在单独的列中并使用 ts_query() 来搜索它.这也允许您在 tsvector 上创建 GiST 索引.

Full-blown text search

Obviously doing this for every row in a query will be expensive -- so you should store the tsvector in a separate column and use ts_query() to search for it. This also allows you to create a GiST index on the tsvector.

postgres> insert into text (phrase, tsvec)
postgres>   values('What are Q-type Operations?',
postgres>   to_tsvector('english', 'What are Q-type Operations?'));
INSERT 0 1

使用 tsquery 和 @@ 运算符完成搜索:

Searching is done using tsquery and the @@ operator:

postgres> select phrase from text where tsvec @@ to_tsquery('q-type');
           phrase            
-----------------------------
 What are Q-type Operations?
(1 row)

这篇关于如何在 Postgresql 中用 SQL 查询带有标点符号的单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆