SQL Server,ISABOUT,加权术语 [英] SQL Server, ISABOUT, weighted terms

查看:281
本文介绍了SQL Server,ISABOUT,加权术语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找出在SQL SERVER的ISABOUT查询中加权术语的确切工作方式。
这里是我现在的位置:

每个查询都会返回以下行:

QUERY 1(权重1): 初始排名

  SELECT * FROM CONTAINSTABLE( documentPart,title,'ISABOUT(eweight(1))')ORDER BY RANK DESC,[KEY] 

KEY RANK
306342 249
272619 156
221557 114

QUERY 2(权重0.8):保留初始订单

  SELECT * FROM CONTAINSTABLE(documentParts,title,'ISABOUT(eweight (0.8))')ORDER BY RANK DESC,[KEY] 

KEY RANK
306342 321
272619 201
221557 146

QUERY 3(权重0.2): 排名增加,初始订单被保留

  SELECT * FROM CONTAINSTABLE(documentParts,title,'ISABOUT(eweight(0.2))')ORDER BY RANK DESC, [KE Y] 

KEY RANK
306342 998
272619 877
221557 692

QUERY 4(权重0.17): 排名下降,最佳匹配现在是最后一个,这些词的倒行为开始于0.17

  SELECT * FROM CONTAINSTABLE(documentParts,title,'ISABOUT(eweight(0.17))')ORDER BY RANK DESC,[KEY ] 

KEY RANK
272619 960
221557 958
306342 802

QUERY 5(分量0.16): 排名增加,最佳匹配现在为秒 pre $ SELECT * FROM CONTAINSTABLE(documentParts,title,'ISABOUT(eweight(0.17))')ORDER BY RANK DESC,[KEY]

KEY RANK
272619 978
306342 935
221557 841

QUERY 6(权重0.01): 排名减少,最佳匹配再次排在前面

 选择*从CON TAINSTABLE(documentParts,title,'ISABOUT(eweight(0.01))')ORDER BY RANK DESC,[KEY] 

KEY RANK
221557 105
272619 77
306342 50

体重1的最佳匹配级别为249,体重降至最佳匹配的0.2排名上升到998.
从0.2到0.17排名下降,从0.16结果倒转(重现此行为的权重值取决于术语,也许在列中搜索...



似乎有一点,权重意味着相反,就像不包含这个词。

你对这种行为有任何解释吗?

为什么在体重下降时排名增加?

为什么排名在某个点之后下降直到结果出现倒退,您如何预测这一点?



当用户搜索创建以下查询的内容时,我使用自定义的断字符:

wordB *权重(0.1),
wordC *的权重(0.1),
wordA *权重b $ b)')

我是否期待0.1字的大排名?


下面的查询与上面的查询相同,我是否期望0.1排名有些奇怪的行为?

  ISABOUT(wordA wordB wordCweight(0.8)),
或ISABOUT(wordA *NEARwordB *NEARwordC *weight( 0.6)),
或ISABOUT(wordA *权重(0.1)),
或ISABOUT(wordB *权重(0.1)),
或ISABOUT(wordC *权重( 0.1)),
')

编辑: />
我找到了这个话题: http ://msdn.microsoft.com/en-us/library/ms142524(v = sql.105).aspx
它回答了我的一些问题,但创建了一些新的!



我在两张表格documents和documentParts中搜索并使用union all来总结行列并获得结果。根据这篇文章,这是错误的,因为索引行被计算为计算排名,所以RANK将像加入苹果和胡萝卜一样...



现在我的解决方案是计算一个每个CONTAINSTABLE的百分比是这样的:

pre $ Log(RANK)/ Log(Sum(RANK)OVER(PARTITION BY 1))AS [PERCENT]

以及总和...

解决方案

根据我的经验,我已经获得了权重加起来为1的最佳结果。 code> CONTAINSTABLE(documentParts,content,
'ISABOUT(
wordA wordB wordCweight(0.5),
wordA *NEARwordB *NEARwordC *权重(0.2),
wordA *权重(0.1),
wordB *权重(0.1),
wordC *权重(0.1)
)')


I am trying to figure out exactly how weighted terms work in a ISABOUT query in SQL SERVER. Here is where I currently am:

Each query returns the following rows:

QUERY 1 (weight 1): Initial ranking

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (1) ) ') ORDER BY RANK DESC, [KEY]

KEY     RANK
306342  249
272619  156
221557  114

QUERY 2 (weight 0.8): Ranking increases, initial order is preserved

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.8) ) ') ORDER BY RANK DESC, [KEY]

 KEY     RANK
 306342  321
 272619  201
 221557  146

QUERY 3 (weight 0.2): Ranking increases, initial order is preserved

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.2) ) ') ORDER BY RANK DESC, [KEY]

 KEY    RANK
 306342 998
 272619 877
 221557 692

QUERY 4 (weight 0.17): Ranking decreases, best match is now last, inverted behavior for these terms begin at 0.17

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.17) ) ') ORDER BY RANK DESC, [KEY]

 KEY      RANK
 272619   960
 221557   958
 306342   802

QUERY 5 (weight 0.16): Ranking increases, best match is now second

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.17) ) ') ORDER BY RANK DESC, [KEY]

 KEY      RANK
 272619   978
 306342   935
 221557   841

QUERY 6 (weight 0.01): Ranking decreases, best match is last again

SELECT * FROM CONTAINSTABLE(documentParts, title, 'ISABOUT ("e" weight (0.01) ) ') ORDER BY RANK DESC, [KEY]

 KEY    RANK
 221557 105
 272619 77
 306342 50

Best match for weight 1 has a rank of 249 and while weight goes down to 0.2 ranking of best match increases to 998. From 0.2 to 0.17 ranking decreases and from 0.16 results are inverted (the weight values that reproduce this behavior depend on terms and maybe on columns searched...)

It seems there is a point where weight means the opposite, something like "do not include this term".
Do you have any explanation of this behavior?
Why ranking increases when weight decreases?
Why ranking decreases after some point until results are inverted and how can you predict this point?

I use a custom "word-breaker", when user searches for something creating the following query:

CONTAINSTABLE(documentParts, title, 
      'ISABOUT (
          "wordA wordB wordC" weight (0.8), 
          "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6), 
          "wordA*" weight (0.1), 
          "wordB*" weight (0.1), 
          "wordC*" weight (0.1), 
       ) ')

Am I to expect big ranks for for 0.1 words?

Is the following query the same as above and am I to expect some weird behavior with the 0.1 rankings?

CONTAINSTABLE(documentParts, title, '
      ISABOUT ( "wordA wordB wordC" weight (0.8) ), 
      OR ISABOUT ( "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.6) ), 
      OR ISABOUT ( "wordA*" weight (0.1) ), 
      OR ISABOUT ( "wordB*" weight (0.1) ), 
      OR ISABOUT ( "wordC*" weight (0.1) ), 
      ')

EDIT:
I found this topic: http://msdn.microsoft.com/en-us/library/ms142524(v=sql.105).aspx which answers some of my questions, but creates some new!

I am searching in two tables, "documents" and "documentParts" and use a union all to sum ranks and get my results. According to this article there it's wrong since indexed rows are counted to compute ranking so RANK will be like adding apples and carrots...

My solution for now is to compute a percentage for each CONTAINSTABLE like this:

Log(RANK) / Log(Sum(RANK) OVER( PARTITION BY 1)) AS [PERCENT]

and sum on this...

解决方案

In my experience I have had the best results where the weights add up to 1.

CONTAINSTABLE(documentParts, content, 
          'ISABOUT (
              "wordA wordB wordC" weight (0.5), 
              "wordA*" NEAR "wordB*" NEAR "wordC*" weight (0.2), 
              "wordA*" weight (0.1), 
              "wordB*" weight (0.1), 
              "wordC*" weight (0.1) 
           ) ')

这篇关于SQL Server,ISABOUT,加权术语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆