带前缀和干扰词的SQL Server全文搜索 [英] SQL Server full-text search with prefix and noise word

查看:95
本文介绍了带前缀和干扰词的SQL Server全文搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不明白为什么在Query1和Query2之后不返回相似的结果集.

I do not understand why following Query1 and Query2 don't return similar result sets.

select * from dbo.tFTS_test where contains (*, '"qwe-asd*"')

返回:

Id          Value
----------- ----------

(0 row(s) affected)

查询2:

select * from dbo.tFTS_test where contains (*, '"qwe-asd"')

返回:

Id          Value
----------- ----------
Informational: The full-text search condition contained noise word(s).
1           qwe-asd

(1 row(s) affected)

表格:

select * from dbo.tFTS_test

返回:

Id          Value
----------- ----------
1           qwe-asd

(1 row(s) affected)

以下是一些辅助查询

Helper Query1:

select * from sys.dm_fts_index_keywords_by_document (db_id(), object_id('dbo.tFTS_test'))

返回:

keyword                         display_term  column_id   document_id          occurrence_count
------------------------------- ------------- ----------- -------------------- ----------------
0x007100770065                  qwe           2           1                    1
0x007100770065002D006100730064  qwe-asd       2           1                    1
0xFF                            END OF FILE   2           1                    1

(3 row(s) affected)

Helper Query2:

select p.*
from sys.fulltext_stoplists s
cross apply sys.dm_fts_parser ('"qwe-asd"', 1033, s.stoplist_id, 0) p
where s.name = 'FTS_test_stoplist'

返回:

keyword                         group_id    phrase_id   occurrence  special_term     display_term  expansion_type source_term
------------------------------- ----------- ----------- ----------- ---------------- ------------- -------------- -----------
0x007100770065002D006100730064  1           0           1           Exact Match      qwe-asd       0              qwe-asd
0x007100770065                  1           0           1           Exact Match      qwe           0              qwe-asd
0x006100730064                  1           0           2           Noise Word       asd           0              qwe-asd

(3 row(s) affected)

Helper Query3:

select p.*
from sys.fulltext_stoplists s
cross apply sys.dm_fts_parser ('"qwe-asd*"', 1033, s.stoplist_id, 0) p
where s.name = 'FTS_test_stoplist'

返回:

keyword                         group_id    phrase_id   occurrence  special_term     display_term  expansion_type source_term
------------------------------- ----------- ----------- ----------- ---------------- ------------- -------------- -----------
0x007100770065002D006100730064  1           0           1           Exact Match      qwe-asd       0              qwe-asd
0x007100770065                  1           0           1           Exact Match      qwe           0              qwe-asd
0x006100730064                  1           0           2           Exact Match      asd           0              qwe-asd

(3 row(s) affected)

结构如下:

-- ****************************
-- Step 1. Cleanup FTS Structure
-- ****************************

if exists (select 1 from sys.fulltext_indexes where object_id = object_id('dbo.tFTS_test'))
    drop fulltext index on dbo.tFTS_test;
go
if exists (select 1 from sys.fulltext_catalogs where name = 'FTS_test')
    drop fulltext catalog FTS_test;
go
if exists (select 1 from sys.fulltext_stoplists where name = 'FTS_test_stoplist')
    drop fulltext stoplist FTS_test_stoplist;
go
if object_id ('dbo.tFTS_test') is not null
    drop table dbo.tFTS_test;
go

-- ****************************
-- Step 2. Create FTS Structure
-- ****************************

create table dbo.tFTS_test (
    Id int not null,
    Value varchar(100) not null,
    constraint [PK_tFTS_test] primary key clustered (Id asc)
);
go
create fulltext stoplist FTS_test_stoplist from system stoplist;
go
alter fulltext stoplist FTS_test_stoplist add 'asd' language 'English';
go
create fulltext catalog FTS_test with accent_sensitivity = off;
go
create fulltext index on dbo.tFTS_test (Value language English) key index PK_tFTS_test on (FTS_test);
go
if not exists (
    select 1
    from sys.fulltext_indexes i
    inner join sys.fulltext_stoplists l on l.stoplist_id = i.stoplist_id
    where i.object_id = object_id('dbo.tFTS_test') and l.name = 'FTS_test_stoplist'
)
    alter fulltext index on dbo.tFTS_test set stoplist FTS_test_stoplist;
go
insert into dbo.tFTS_test (Id, Value) values (1, 'qwe-asd');
go

P.S.抱歉,这么大的问题.

推荐答案

差异是由连字符引起的,全文搜索将查询字符串视为两个单词而不是一个单词.另外,由于"asd"是一个干扰词,因此找不到.

The difference is caused by the hyphen, Full Text Search is treating your query string as two words instead of one. And additionally, since "asd" is a noise word it will not find it.

当分词系统在术语中遇到分词系统时,分词系统会将其解析为空白字符.

When the word breaker encounters a word-breaking character in a term, the word breaker parses the character as a white space character.

断字字符包括以下内容:

Word-breaking characters include the following:

  • •$(美元符号)
  • ,(逗号)
  • &(&)
  • #(数字符号)

当分词系统在术语中遇到连字符(-)时,分词系统将正确解析该术语.但是,全文词库组件将由连字符连接的字符以及连字符本身视为空字符.例如,如果原始术语是知名名人",则该词在同义词库文件中显示为名人".

When the word breaker encounters a hyphen (-) in a term, the word breaker correctly parses the term. However, the full-text thesaurus component treats the characters that are connected by the hyphen together with the hyphen itself as empty characters. For example, if the original term is "well-known celebrity," the term appears as "celebrity" in the thesaurus file.

这是来自Microsoft网站,不是相同的问题,但有根本原因:

This is from the Microsoft site, not the same problem but one that share the root cause:

在运行使用SQL Server 2005中的同义词库文件的全文本搜索查询时,您会得到不正确的结果

这篇关于带前缀和干扰词的SQL Server全文搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆