SQL Server全文索引意外的结果 [英] SQL-Server Full Text Index Unexpected results

查看:146
本文介绍了SQL Server全文索引意外的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

MS SQL SERVER 2012



我在数据库上配置了全文索引,对于主要部分来说似乎正常工作。我有一个包含以下文本的关键字字段:

 兽医产品Beaphar Fiprotec斑点小狗67mg x 1 235_ldzr_2 Beaphar 87112311437_ldzr_2 

试图选择行时,我得到意想不到的结果

  select * from产品CONTAINS(关键字,'235 *'')
选择*从CONTAINS(关键字,'87112311437_ldzr_2 *'')

两者均返回预期结果。这告诉我索引设置和下划线字符不是问题。然而,我没有得到以下结果:

  select * from CONTAINS(keywords,'235_ *'')
选择* CONTAINS(关键字,235_ldzr_2 *')

的产品我有自动更改跟踪打开,并禁用它,手动重建索引,重新启用它,仍然没有运气。我甚至已经着手评估字符串的字符来获取ascii代码,以防特殊字符通过导入进入。该术语不会出现在停止列表中。我现在坚持为什么这个项目不显示。



我在一个单独的数据库中创建了一个副本,并且无法重现该问题,我真的没有在这里发布代码 - 因为我不能显示错误。



有没有人有任何进一步检查我可以执行或知道任何其他影响全文搜索功能的点?



@HoneyBadger
我缩小了范围。数字后面跟着数字,下划线似乎有问题。

  DELETE FROM TestFullTextSearch 
插入TestFullTextSearch值(1,'235_ldzr_2 testing' )
插入TestFullTextSearch值(2,'test 235_ldzr_2 testing')
插入TestFullTextSearch值(3,'A 235_ldzr_2 testing')
插入TestFullTextSearch值(4,'1 235_ldzr_2 testing ')
insert into TestFullTextSearch values(5,'12 235_ldzr_2 testing')
insert into TestFullTextSearch values(6,'123 235_ldzr_2 testing')

SELECT *
来自TestFullTextSearch
其中contains(AllText,'235 *')

SELECT *
from TestFullTextSearch
其中contains(AllText,'235_ *')

1,2,3行全部按预期返回。 4,5,6不包括下划线时。

解决方案

SQL Server有一个单词和符号列表,它们是被认为是噪声,这些都被排除在索引之外,你不能专门搜索它。你可以编辑这个列表,或者你可以创建一个新列表。请参阅这里介绍。



显然,您甚至可以在文件级别对其进行编辑,请参阅这里



编辑:

我已经能够重现该问题:

  create table TestFullTextSearch(
Id int not null,
AllText nvarchar(400)


在TestFullTextSearch(Id)上创建唯一索引test_tfts;
创建全文目录ftcat_tfts;
在TestFullTextSearch(AllText)上创建全文索引
key index在ftcat_tfts上的test_tfts
with change_tracking auto,stoplist关闭


插入TestFullTextSearch值(1 ,'legacyreport Report Legacy 23049823490 20150713 Cardiac US')
插入到TestFullTextSearch值(2,'123-45-678 foo bar 19450712 20020723 Exercise Stress US')
插入TestFullTextSearch值(3,'2048 jj goodguy xy2000 19490328 20150721 Cardiac US')
插入到TestFullTextSearch值(4,'12345678 4.0 ALLCALCS 19650409 20031103 Cardiac Difficult US')
插入TestFullTextSearch值(5,'兽医产品Beaphar Fiprotec Spot On Small Dog 67mg x 1 235_ldzr_2 Beaphar 87112311437_ldzr_2')

在这张表上,如果我执行

  select * 
from TestFullTextSearch
其中contains(AllText,'235_ldzr_2 *')

我没有得到任何结果。但是,如果我添加了一个反斜杠:

  select * 
FROM TestFullTextSearch
WHERE CONTAINS(AllText,' 235 \_ldzr_2 *')

我确实收到了结果!



我不明白。如果我添加另一行:

 插入TestFullTextSearch值(6,'Veterinary Products Beaphar Fiprotec Spot On Small Dog 67_mg x 1 235_ldzr_2 Beaphar 87112311437_ldzr_2')

我搜索 WHERE CONTAINS(AllText,'我得到的结果如预期的那样...其他测试也不会显示与 contains相同的行为(AllText,'235_ldzr_2 * ')



顺便说一下,我的LCID = 1033。



另外一个测试案例:

 插入TestFullTextSearch值(15,'Veterinary Products Beaphar Fiprotec Spot Small Dog 100_ldzr_2 x 1 225_ldzr_2 Beaphar 87112311437_ldzr_2')

我可以用这个记录找到

  CONTAINS(AllText,'100_ *')



<但是,这个记录是在 100_ldzr 前添加 x 1

  Inse rt进入TestFullTextSearch值(16,'Veterinary Products Beaphar Fiprotec Spot On Small Dog x 1 100_ldzr_2 x 1 225_ldzr_2 Beaphar 87112311437_ldzr_2')

我找不到相同的 CONTAINS ,但是如果添加反斜杠,我可以找到它。



结论(现在)



看起来可能会出现一个问题,即数字后跟一个空格,然后是结尾的数字下划线。考虑以下几点:

  select * from sys.dm_fts_parser('x 235_ *',1033,0,0)

结果为2个搜索字词:x和235 _


但是:
$ b

  select * from sys.dm_fts_parser(''1 235_ *',1033 ,0,0)

结果有7个搜索词:1 235,1235, 1,235和_。 (1和235也出现为nn1 / nn235)。



缺少235_解释了为什么找不到它。当我添加一个反斜杠时,它被解释为字分隔符,匹配 235 \ _ 235 code> _ )。



我非常有信心这是原因。解决方案可能更难。

MS SQL SERVER 2012

I have a full text index configured on a database which for the main part seems to be functioning correctly. I have a keyword field that contains the following text:

 Veterinary Products Beaphar Fiprotec Spot On Small Dog 67mg x 1 235_ldzr_2 Beaphar 87112311437_ldzr_2

When trying to select the row I am getting unexpected results

select * from products where CONTAINS(keywords,'"235*"')
select * from products where CONTAINS(keywords,'"87112311437_ldzr_2*"')

Both return the expected result. This tells me that the index is setup and the underscore character is not an issue. However I get no results for:

select * from products where CONTAINS(keywords,'"235_*"')
select * from products where CONTAINS(keywords,'"235_ldzr_2*"')

I have automatic change tracking switched on, and have disabled it, rebuilt the index manually, re-enabled it, and still no luck. I have even gone to the point of evaluating the string character by character to get the ascii codes in case a special character had made it in via the import. The term does not appear in the stop list. I am now stuck as to why this item does not show.

I have created a copy in a seperate database and have not been able to reproduce the problem, so can't really post code here - as I cannot show the error.

Does anyone have any further checks I can perform or know of any other points that affect the full text search functionality?

@HoneyBadger I have narrowed it down. there seems to be a problem around digits followed by digits with an underscore. Try the following on your table structure.

DELETE FROM TestFullTextSearch
insert into TestFullTextSearch values (1, '235_ldzr_2 testing ')
insert into TestFullTextSearch values (2, 'test 235_ldzr_2 testing ')
insert into TestFullTextSearch values (3, 'A 235_ldzr_2 testing ')
insert into TestFullTextSearch values (4, '1 235_ldzr_2 testing ')
insert into TestFullTextSearch values (5, '12 235_ldzr_2 testing ')
insert into TestFullTextSearch values (6, '123 235_ldzr_2 testing ')

SELECT * 
from   TestFullTextSearch 
where  contains(AllText, '"235*"')

SELECT * 
from   TestFullTextSearch 
where  contains(AllText, '"235_*"')

Rows 1,2,3 all return as expected. 4,5,6 don't when the underscore is included.

解决方案

SQL Server has a list of words and symbols which are considered noise, these are excluded from the index, and you can't search for it specifically. You can edit this list, or you can create a new one. See here for the how-to.

Apparently, you can even edit it at file level, see here.

Edit:

I've been able to reproduce the problem:

create table TestFullTextSearch (
Id int not null,
AllText nvarchar(400)
)

create unique index test_tfts on TestFullTextSearch(Id);
create fulltext catalog ftcat_tfts;
create fulltext index on TestFullTextSearch(AllText)
key index test_tfts on ftcat_tfts
with change_tracking auto, stoplist off
go

insert into TestFullTextSearch values (1, 'legacyreport Report Legacy 23049823490  20150713 Cardiac US ')
insert into TestFullTextSearch values (2, '123-45-678 foo bar  19450712 20020723 Exercise Stress US ')
insert into TestFullTextSearch values (3, '2048 jj goodguy xy2000 19490328 20150721 Cardiac US ')
insert into TestFullTextSearch values (4, '12345678 4.0 ALLCALCS  19650409 20031103 Cardiac Difficult US ')
insert into TestFullTextSearch values (5, 'Veterinary Products Beaphar Fiprotec Spot On Small Dog 67mg x 1 235_ldzr_2 Beaphar 87112311437_ldzr_2 ')

On this table, if I execute

select * 
from   TestFullTextSearch 
where  contains(AllText, '"235_ldzr_2*"')

I get no results. However, if I add a backslash:

select  * 
FROM    TestFullTextSearch 
WHERE   CONTAINS(AllText, '"235\_ldzr_2*"')

I do get a result!

I don't understand though. If I add another row:

insert into TestFullTextSearch values (6, 'Veterinary Products Beaphar Fiprotec Spot On Small Dog 67_mg x 1 235_ldzr_2 Beaphar 87112311437_ldzr_2 ')

And I search for WHERE CONTAINS(AllText, '"67_*"'), I get the result back as expected... Other tests as well, don't show the same behaviour as contains(AllText, '"235_ldzr_2*"').

By the way, my LCID = 1033.

One more test case:

Insert into TestFullTextSearch values (15, 'Veterinary Products Beaphar Fiprotec Spot On Small Dog 100_ldzr_2 x 1 225_ldzr_2 Beaphar 87112311437_ldzr_2 ')

This record I can find with

CONTAINS(AllText, '"100_*"')

However, this record, where I added x 1 in front of the 100_ldzr:

Insert into TestFullTextSearch values (16, 'Veterinary Products Beaphar Fiprotec Spot On Small Dog x 1 100_ldzr_2 x 1 225_ldzr_2 Beaphar 87112311437_ldzr_2 ')

I cannot find with the same CONTAINS, but I can find it if I add a backslash.

Conclusion (for now)

It seems there can be an issue when numbers are followed by a space, then followed by numbers ending in an underscore. Consider the following:

select * from sys.dm_fts_parser('"x 235_*"', 1033, 0, 0)

This results in 2 search terms: "x" and "235_"

This however:

select * from sys.dm_fts_parser('"1 235_*"', 1033, 0, 0)

Results in seven search terms: "1 235", "1235", "1", "235", and "_". (1 and 235 also occur as nn1/nn235).

The absence of "235_" explains why it could not be found. When I added a backslash, it was interpreted as word divider , matching 235\_ to 235 (and to _).

I'm pretty confident this is the cause. The solution is probably more difficult.

这篇关于SQL Server全文索引意外的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆