SQL Server全文搜索包含连字符的短语不会返回预期的结果 [英] SQL Server full-text search for phrase containing a hyphen doesn't return expected results
问题描述
我们有一个使用SQL Server 2008数据库和全文搜索的应用程序。我试图理解为什么以下搜索行为有所不同:
首先,包含带连字符的单词的短语,如下所示:
contains(column_name,'一个两三四五'')
其次,一个相同的短语,连字符被空格替换:
contains( ')
全文索引使用英文( 1033)语言环境和默认系统停止列表。
从我对其他包含带连字符的全文搜索的观察结果来看,第一个应该允许匹配一二三四五
或一两四五
。相反,它只匹配一两四五
(而不是一二三四五
)。
测试用例
p>
create table ftTest
(
Id int identity(1,1)not null,
Value nvarchar(100)不为空,
约束PK_ftTest主键(Id)
);
插入ftTest(Value)值('one two-three-four five');
插入ftTest(Value)值('one twothree four five');
创建全文目录ftTest_catalog;
在ftTest上创建全文索引(Value language 1033)
key index在ftTest_catalog上的PK_ftTest;
GO
查询:
- 返回一个匹配
select * from ftTest其中包含(Value,''一个两三四五'')
- 返回两个匹配
select * from ftTest where contains(Value,'one two three four five'')
select * from ftTest where contains(Value,'one andtwo-three-four five ')
select * from ftTest where contains(Value,'one two-three-fourand five')
GO
清理:
在ftTest上放置全文索引
放全文目录ftTest_catalog;
drop table ftTest;
http://support.microsoft.com/default.aspx?scid=kb;en-us;200043
在搜索标准中必须使用非字母数字字符(主要是短划线' - '字符)时,请使用Transact-SQL LIKE子句而不是FULLTEXT或CONTAINS谓词。
We have an application that using a SQL Server 2008 database, and full-text search. I'm trying to understand why the following searches behave differently:
First, a phrase containing a hyphenated word, like this:
contains(column_name, '"one two-three-four five"')
And second, an identical phrase, where the hyphens are replaced by spaces:
contains(column_name, '"one two three four five"')
The full-text index uses the ENGLISH (1033) locale, and the default system stoplist.
From my observations of other full-text searches containing hyphenated words, the first one should allow for matches on either one two three four five
or one twothreefour five
. Instead, it only matches one twothreefour five
(and not one two-three-four five
).
Test Case
Setup:
create table ftTest
(
Id int identity(1,1) not null,
Value nvarchar(100) not null,
constraint PK_ftTest primary key (Id)
);
insert ftTest (Value) values ('one two-three-four five');
insert ftTest (Value) values ('one twothreefour five');
create fulltext catalog ftTest_catalog;
create fulltext index on ftTest (Value language 1033)
key index PK_ftTest on ftTest_catalog;
GO
Queries:
--returns one match
select * from ftTest where contains(Value, '"one two-three-four five"')
--returns two matches
select * from ftTest where contains(Value, '"one two three four five"')
select * from ftTest where contains(Value, 'one and "two-three-four five"')
select * from ftTest where contains(Value, '"one two-three-four" and five')
GO
Cleanup:
drop fulltext index on ftTest
drop fulltext catalog ftTest_catalog;
drop table ftTest;
http://support.microsoft.com/default.aspx?scid=kb;en-us;200043
"Where non-alphanumeric character must be used in the search critera (primarily the dash '-' character), use the Transact-SQL LIKE clause instead of the FULLTEXT or CONTAINS predicates."
这篇关于SQL Server全文搜索包含连字符的短语不会返回预期的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!