MongoDB全部和部分文本搜索 [英] MongoDB Full and Partial Text Search

查看:105
本文介绍了MongoDB全部和部分文本搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

环境:

  • 带有mongos的MongoDB(3.2.0)

收藏夹:

  • 用户

创建文本索引:

  BasicDBObject keys = new BasicDBObject();
  keys.put("name","text");

  BasicDBObject options = new BasicDBObject();
  options.put("name", "userTextSearch");
  options.put("unique", Boolean.FALSE);
  options.put("background", Boolean.TRUE);

  userCollection.createIndex(keys, options); // using MongoTemplate


文档:

  • {名称":"LEONEL"}

查询:

  • db.users.find( { "$text" : { "$search" : "LEONEL" } } ) =>找到
  • db.users.find( { "$text" : { "$search" : "leonel" } } ) =>发现(搜索caseSensitive为假)
  • db.users.find( { "$text" : { "$search" : "LEONÉL" } } ) =>发现(使用diacriticSensitive搜索为假)
  • db.users.find( { "$text" : { "$search" : "LEONE" } } ) =>找到(部分搜索)
  • db.users.find( { "$text" : { "$search" : "LEO" } } ) =>未找到(部分搜索)
  • db.users.find( { "$text" : { "$search" : "L" } } ) =>未找到(部分搜索)
  • db.users.find( { "$text" : { "$search" : "LEONEL" } } ) => FOUND
  • db.users.find( { "$text" : { "$search" : "leonel" } } ) => FOUND (search caseSensitive is false)
  • db.users.find( { "$text" : { "$search" : "LEONÉL" } } ) => FOUND (search with diacriticSensitive is false)
  • db.users.find( { "$text" : { "$search" : "LEONE" } } ) => FOUND (Partial search)
  • db.users.find( { "$text" : { "$search" : "LEO" } } ) => NOT FOUND (Partial search)
  • db.users.find( { "$text" : { "$search" : "L" } } ) => NOT FOUND (Partial search)

您知道为什么我使用查询"LEO"或"L"得到0个结果吗?

Any idea why I get 0 results using as query "LEO" or "L"?

不允许带有文本索引搜索的正则表达式.

Regex with Text Index Search is not allowed.

db.getCollection('users')
     .find( { "$text" : { "$search" : "/LEO/i", 
                          "$caseSensitive": false, 
                          "$diacriticSensitive": false }} )
     .count() // 0 results

db.getCollection('users')
     .find( { "$text" : { "$search" : "LEO", 
                          "$caseSensitive": false, 
                          "$diacriticSensitive": false }} )
.count() // 0 results


MongoDB文档:

  • Text Search
  • $text
  • Text Indexes
  • Improve Text Indexes to support partial word match

推荐答案

在MongoDB 3.4上,文本搜索" 功能旨在支持使用针对停用词和词干的特定于语言的规则对文本内容进行不区分大小写的搜索. 受支持的语言的词干规则基于通常可处理常见问题的标准算法动词和名词,但不知道专有名词.

As at MongoDB 3.4, the text search feature is designed to support case-insensitive searches on text content with language-specific rules for stopwords and stemming. Stemming rules for supported languages are based on standard algorithms which generally handle common verbs and nouns but are unaware of proper nouns.

不存在对部分或模糊匹配的明确支持,但源自相似结果的术语可能正在这样工作.例如:味道",味道"和有品味"都源于味道".尝试雪球摘取演示页可以尝试更多单词和词干算法.

There is no explicit support for partial or fuzzy matches, but terms that stem to a similar result may appear to be working as such. For example: "taste", "tastes", and tasteful" all stem to "tast". Try the Snowball Stemming Demo page to experiment with more words and stemming algorithms.

您匹配的结果都是同一个单词"LEONEL"的变体,并且仅因大小写和变音符号而异.除非您所选择的语言规则将"LEONEL"限制为较短的词,否则这是唯一可以匹配的变体类型.

Your results that match are all variations on the same word "LEONEL", and vary only by case and diacritic. Unless "LEONEL" can be stemmed to something shorter by the rules of your selected language, these are the only type of variations that will match.

如果您想进行有效的部分匹配,则需要采用其他方法.有关一些有用的想法,请参见:

If you want to do efficient partial matches you'll need to take a different approach. For some helpful ideas see:

  • Efficient Techniques for Fuzzy and Partial matching in MongoDB by John Page
  • Efficient Partial Keyword Searches by James Tan

有一个相关的改进请求,您可以在MongoDB问题跟踪器中查看/支持: SERVER-15090 :改进文本索引以支持部分单词匹配.

There is a relevant improvement request you can watch/upvote in the MongoDB issue tracker: SERVER-15090: Improve Text Indexes to support partial word match.

这篇关于MongoDB全部和部分文本搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆