SQL搜索的关键字 [英] Keyword to SQL search

查看:300
本文介绍了SQL搜索的关键字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用案例

当用户访问我的网站时,他们将面临一个类似SO的搜索框。他们可以使用计划文本搜索结果。 .net问题,封闭的问题,.net和java等。搜索的功能有点不同,因为它会尝试尽可能多的数据库模式,而不是一个直全文搜索。所以.net问题只会搜索.net问题,而不是.net答案(可能不适用于SO case,这里只是一个例子),封闭问题将返回关闭的问题,.net和java问题将返回与.net和java等相关的问题。

When a user goes to my website, they will be confronted with a search box much like SO. They can search for results using plan text. ".net questions", "closed questions", ".net and java", etc.. The search will function a bit different that SO, in that it will try to as much as possible of the schema of the database rather than a straight fulltext search. So ".net questions" will only search for .net questions as opposed to .net answers (probably not applicable to SO case, just an example here), "closed questions" will return questions that are closed, ".net and java" questions will return questions that relate to .net and java and nothing else.

问题

我不太熟悉的话,但我基本上想做一个关键字到SQL驱动的搜索。我知道数据库的模式,我也可以数据库数据库。我想知道任何现有的方法存在之前,我试图实现这一点。我想这个问题是针对所述问题的好设计。

I'm not too familiar with the words but I basically want to do a keyword to SQL driven search. I know the schema of the database and I also can datamine the database. I want to know any current approaches there that existing out already before I try to implement this. I guess this question is for what is a good design for the stated problem.

建议

我建议的解决方案到目前为止看起来像这样

My proposed solution so far looks something like this


  1. 清理输入。只需删除任何特殊字符

  2. 将输入解析为数据块。将c#java的输入转换为c#和java也将c#java问题的特殊情况转换为c#java和问题。


  3. 将数据绑定到元数据中。因此,转换封闭式问题的内容,并将其与表格的已关闭列相关联。

  4. 将该树转换为sql查询。

  1. Clean the input. Just remove any special characters
  2. Parse the input into chunks of data. Break an input of "c# java" into c# and java Also handle the special cases like "'c# java' questions" into 'c# java' and "questions".
  3. Build a tree out of the input
  4. Bind the data into metadata. So convert stuff like closed questions and relate it to the isclosed column of a table.
  5. Convert the tree into a sql query.

想法/建议/链接?

推荐答案

我运行一个数字音乐商店,搜索,根据关键字的出现次数和产品出现的模式对关键字进行加权,例如。具有不同的列,例如艺术家,标题或出版商。

I run a digital music store with a "single search" that weights keywords based on their occurrences and the schema in which Products appear, eg. with different columns like "Artist", "Title" or "Publisher".

产品也与相册和播放列表相关,但为了更简单的说明,

Products are also related to albums and playlists, but for simpler explanation, I will only elaborate on the indexing and querying of Products' Keywords.

关键字 code> table - 每个可能被搜索的每个单词的加权表(因此,它被引用到某处),每个记录具有以下数据:

Keywords table - a weighted table for every word that could possibly be searched for (hence, it is referenced somewhere) with the following data for each record:


  • 关键字ID(不是单词),

  • 词本身

  • li>
  • 重量

ProductKeywords 对于每个产品字段(或列)中每个记录引用的每个关键字,每个记录都有以下数据:

ProductKeywords table - a weighted table for every keyword referenced by any of a product's fields (or columns) with the following data for each record:


  • 产品ID, / li>
  • 关键字ID,

  • 权重

加权值表示字词出现的频率。具有较低权重的匹配关键字更独特,并且更有可能是正在搜索的关键字。以这种方式,经常出现的词语被自动向下加权,例如。 the,a或I。

The weighting value is an indication of how often the words occurs. Matching keywords with a lower weight are "more unique" and are more likely to be what is being searched for. In this way, words occurring often are automatically "down-weighted", eg. "the", "a" or "I". However, it is best to strip out atomic occurrences of those common words before indexing.

我使用整数进行加权,但使用十进制值将提供更多的通用性,可能与。

I used integers for weighting, but using a decimal value will offer more versatility, possibly with slightly slower sorting.

每当任何产品字段更新时,艺术家或标题(不会经常发生),数据库触发器在事务中重新索引产品的关键字:

Whenever any product field is updated, eg. Artist or Title (which does not happen that often), a database trigger re-indexes the product's keywords like so inside a transaction:


  1. 全部

  2. 每个索引字段(例如Artist)值作为关键字存储/检索为整个关键字,并与 ProductKeywords 表,以便直接匹配。

  3. 然后,关键字weight将增加一个取决于字段重要性的值。您可以根据字段的重要性来添加,减去权重。如果艺术家比标题更重要,请从其 ProductKeyword 重量调整中减去1或2。

  4. 每个索引字段值将被删除非字母数字字符,并分割成单独的字组,例如。

  5. 每个字段值的每个单独的字组都是soundexed并存储/检索为关键字,并以相同的方式与产品相关联如果某个关键字已与某个产品相关联,那么该关键字的权重将被简单调整。

  1. All product keywords are disassociated and deleted if no longer referenced.
  2. Each indexed field (eg. Artist) value is stored/retrieved as a keyword in its entirety and related to the product in the ProductKeywords table for a direct match.
  3. The keyword weight is then incremented by a value that depends on the importance of the field. You can add, subtract weight based on the importance of the field. If Artist is more important than Title, Subtract 1 or 2 from its ProductKeyword weight adjustment.
  4. Each indexed field value is stripped of any non-alphanumeric characters and split into separate word groups, eg. "Billy Joel" becomes "Billy" and "Joel".
  5. Each separate word group for each field value is soundexed and stored/retrieved as a keyword and associated with the product in the same way as in step 2. If a keyword has already been associated with a product, its weight is simply adjusted.



查询



Querying


  1. 以输入查询搜索字符串作为整体并查找直接匹配关键字。

  2. 删除所有非字母数字字符,并将查询拆分为关键字。检索所有现有关键字(只有少数匹配)。

  3. 重复步骤2,但使用soundex值,调整权重以减少相关性。

  4. 将检索到的产品关键字加入其相关产品,并检索每个产品的销售量,这是一种衡量人气的衡量标准。

  5. 按关键字权重,ProductKeyword重量和销售。

  6. 限制结果并将产品搜索结果返回给客户。

  1. Take the input query search string in its entirety and look for a direct matching keyword. Retrieve all ProductKeywords for the keyword in an in-memory table along with Keyword weight (different from ProductKeyword weight).
  2. Strip out all non-alphanumeric characters and split query into keywords. Retrieve all existing keywords (only a few will match). Join ProductKeywords to matching keywords to in-memory table along with Keyword weight, which is different from the ProductKeyword weight.
  3. Repeat Step 2 but use soundex values instead, adjusting weights to be less relevant.
  4. Join retrieved ProductKeywords to their related Products and retrieve each product's sales, which is a measure of popularity.
  5. Sort results by Keyword weight, ProductKeyword weight and Sales. The final summing/sorting and/or weighting depends on your implementation.
  6. Limit results and return product search results to client.

这篇关于SQL搜索的关键字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆