sys.dm_fts_parser sql全文 [英] sys.dm_fts_parser sql full text
问题描述
我们很难找出给sys.dm_fts_parser的两个类似的字符串给出不同的结果
select * from sys.dm_fts_parser(''0 CAD'',0,null,0)
似乎认为0 CAD是一个令牌(返回2令牌)
select * from sys.dm_fts_parser(''0 cad'', 0,null,0)
返回3个标记 - 正确
更重要,甚至更令人困惑的是为什么
select * from表其中包含(*,point 5 CAD)
works和
select * from表where contains(*,point 5 cad)
failed
查找的列包含point 5 CAD -
不应该全文索引构建器忽略噪音词(例如5)基于索引设置或包含它。
我们已经尝试过并且不能解释为什么nnnn CAD是特别的东西
注意到根据 http://msdn.microsoft.com/tf全文假设为不区分大小写.com / en-us / library / ms142583.aspx
我错过了什么?
编辑:使用SQL 2012 11.0.2218
使用SQL 2008时
select * from sys.dm_fts_parser(''0 CAD'',0,null ,0) - 给出2个标记
$ p
select * from sys.dm_fts_parser(''0 CAD'',1033,null,0) - 给出3个标记
$ b在SQL 2012(11.0.3218)上:
$ bselect * from sys.dm_fts_parser(''0 CAD'',1033,null,0) - 给出2个标记
在SQL 2012中,Microsoft推出了新的分词器(版本14.0.4763.1000) http://msdn.microsoft.com/en-us/library/gg509108.aspx
现在看来,这个工作断路器识别3个字符的ISO 4217货币代码,如果在3个字符代码之前有一个数字,则不会被分解。
We having a really hard time to figure out two similar strings given to sys.dm_fts_parser gives different results
select * from sys.dm_fts_parser('"0 CAD"', 0, null, 0)
seems to think that "0 CAD" is one token (returns 2 token)
select * from sys.dm_fts_parser('"0 cad"', 0, null, 0)
returns 3 tokens - correctly
more importantly and even more confusing is why
select * from Table where contains(*,"point 5 CAD")
works andselect * from Table where contains(*,"point 5 cad")
failswhere the column searched contains "point 5 CAD" -
Shouldn't the full text index builder either ignore noise words (e.g. "5") based upon the index setting or include it.
We have tried both and cant explain why "nnnn CAD" is something specialnote that full text is suppose to be case-insensitive according to http://msdn.microsoft.com/en-us/library/ms142583.aspx
What am I missing?
Edit: Using SQL 2012 11.0.2218
解决方案When using SQL 2008
select * from sys.dm_fts_parser('"0 CAD"', 0, null, 0) - gives 2 tokens select * from sys.dm_fts_parser('"0 CAD"', 1033, null, 0) - gives 3 tokens
On SQL 2012 (11.0.3218):
select * from sys.dm_fts_parser('"0 CAD"', 1033, null, 0) - gives 2 tokens
In SQL 2012 Microsoft introduced a new word breaker (version 14.0.4763.1000) http://msdn.microsoft.com/en-us/library/gg509108.aspx
It seems that the work-breaker now recognizes 3 character ISO 4217 Currency Codes, and if there is a number prior to the 3 char code it is not broken up.
这篇关于sys.dm_fts_parser sql全文的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!