带前缀和干扰词的SQL Server全文搜索 [英] SQL Server full-text search with prefix and noise word
问题描述
我不明白为什么在Query1和Query2之后不返回相似的结果集.
I do not understand why following Query1 and Query2 don't return similar result sets.
select * from dbo.tFTS_test where contains (*, '"qwe-asd*"')
返回:
Id Value
----------- ----------
(0 row(s) affected)
查询2:
select * from dbo.tFTS_test where contains (*, '"qwe-asd"')
返回:
Id Value
----------- ----------
Informational: The full-text search condition contained noise word(s).
1 qwe-asd
(1 row(s) affected)
表格:
select * from dbo.tFTS_test
返回:
Id Value
----------- ----------
1 qwe-asd
(1 row(s) affected)
以下是一些辅助查询
Helper Query1:
select * from sys.dm_fts_index_keywords_by_document (db_id(), object_id('dbo.tFTS_test'))
返回:
keyword display_term column_id document_id occurrence_count
------------------------------- ------------- ----------- -------------------- ----------------
0x007100770065 qwe 2 1 1
0x007100770065002D006100730064 qwe-asd 2 1 1
0xFF END OF FILE 2 1 1
(3 row(s) affected)
Helper Query2:
select p.*
from sys.fulltext_stoplists s
cross apply sys.dm_fts_parser ('"qwe-asd"', 1033, s.stoplist_id, 0) p
where s.name = 'FTS_test_stoplist'
返回:
keyword group_id phrase_id occurrence special_term display_term expansion_type source_term
------------------------------- ----------- ----------- ----------- ---------------- ------------- -------------- -----------
0x007100770065002D006100730064 1 0 1 Exact Match qwe-asd 0 qwe-asd
0x007100770065 1 0 1 Exact Match qwe 0 qwe-asd
0x006100730064 1 0 2 Noise Word asd 0 qwe-asd
(3 row(s) affected)
Helper Query3:
select p.*
from sys.fulltext_stoplists s
cross apply sys.dm_fts_parser ('"qwe-asd*"', 1033, s.stoplist_id, 0) p
where s.name = 'FTS_test_stoplist'
返回:
keyword group_id phrase_id occurrence special_term display_term expansion_type source_term
------------------------------- ----------- ----------- ----------- ---------------- ------------- -------------- -----------
0x007100770065002D006100730064 1 0 1 Exact Match qwe-asd 0 qwe-asd
0x007100770065 1 0 1 Exact Match qwe 0 qwe-asd
0x006100730064 1 0 2 Exact Match asd 0 qwe-asd
(3 row(s) affected)
结构如下:
-- ****************************
-- Step 1. Cleanup FTS Structure
-- ****************************
if exists (select 1 from sys.fulltext_indexes where object_id = object_id('dbo.tFTS_test'))
drop fulltext index on dbo.tFTS_test;
go
if exists (select 1 from sys.fulltext_catalogs where name = 'FTS_test')
drop fulltext catalog FTS_test;
go
if exists (select 1 from sys.fulltext_stoplists where name = 'FTS_test_stoplist')
drop fulltext stoplist FTS_test_stoplist;
go
if object_id ('dbo.tFTS_test') is not null
drop table dbo.tFTS_test;
go
-- ****************************
-- Step 2. Create FTS Structure
-- ****************************
create table dbo.tFTS_test (
Id int not null,
Value varchar(100) not null,
constraint [PK_tFTS_test] primary key clustered (Id asc)
);
go
create fulltext stoplist FTS_test_stoplist from system stoplist;
go
alter fulltext stoplist FTS_test_stoplist add 'asd' language 'English';
go
create fulltext catalog FTS_test with accent_sensitivity = off;
go
create fulltext index on dbo.tFTS_test (Value language English) key index PK_tFTS_test on (FTS_test);
go
if not exists (
select 1
from sys.fulltext_indexes i
inner join sys.fulltext_stoplists l on l.stoplist_id = i.stoplist_id
where i.object_id = object_id('dbo.tFTS_test') and l.name = 'FTS_test_stoplist'
)
alter fulltext index on dbo.tFTS_test set stoplist FTS_test_stoplist;
go
insert into dbo.tFTS_test (Id, Value) values (1, 'qwe-asd');
go
P.S.抱歉,这么大的问题.
推荐答案
差异是由连字符引起的,全文搜索将查询字符串视为两个单词而不是一个单词.另外,由于"asd"是一个干扰词,因此找不到.
The difference is caused by the hyphen, Full Text Search is treating your query string as two words instead of one. And additionally, since "asd" is a noise word it will not find it.
当分词系统在术语中遇到分词系统时,分词系统会将其解析为空白字符.
When the word breaker encounters a word-breaking character in a term, the word breaker parses the character as a white space character.
断字字符包括以下内容:
Word-breaking characters include the following:
- •$(美元符号)
- ,(逗号)
- &(&)
- #(数字符号)
当分词系统在术语中遇到连字符(-)时,分词系统将正确解析该术语.但是,全文词库组件将由连字符连接的字符以及连字符本身视为空字符.例如,如果原始术语是知名名人",则该词在同义词库文件中显示为名人".
When the word breaker encounters a hyphen (-) in a term, the word breaker correctly parses the term. However, the full-text thesaurus component treats the characters that are connected by the hyphen together with the hyphen itself as empty characters. For example, if the original term is "well-known celebrity," the term appears as "celebrity" in the thesaurus file.
这是来自Microsoft网站,不是相同的问题,但有根本原因:
This is from the Microsoft site, not the same problem but one that share the root cause:
在运行使用SQL Server 2005中的同义词库文件的全文本搜索查询时,您会得到不正确的结果
这篇关于带前缀和干扰词的SQL Server全文搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!