MongoDB 全文和部分文本搜索 [英] MongoDB Full and Partial Text Search

查看:50
本文介绍了MongoDB 全文和部分文本搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

环境:

  • MongoDB (3.2.0) 与猫鼬

集合:

  • 用户

创建文本索引:

  BasicDBObject keys = new BasicDBObject();
  keys.put("name","text");

  BasicDBObject options = new BasicDBObject();
  options.put("name", "userTextSearch");
  options.put("unique", Boolean.FALSE);
  options.put("background", Boolean.TRUE);
  
  userCollection.createIndex(keys, options); // using MongoTemplate


文档:

  • {"name":"LEONEL"}

查询:

  • db.users.find( { "$text" : { "$search" : "LEONEL" } } ) =>发现
  • db.users.find( { "$text" : { "$search" : "leonel" } } ) =>FOUND(搜索 caseSensitive 为 false)
  • db.users.find( { "$text" : { "$search" : "LEONÉL" } } ) =>找到(使用变音符号敏感搜索是错误的)
  • db.users.find( { "$text" : { "$search" : "LEONE" } } ) =>找到(部分搜索)
  • db.users.find( { "$text" : { "$search" : "LEO" } } ) =>未找到(部分搜索)
  • db.users.find( { "$text" : { "$search" : "L" } } ) =>未找到(部分搜索)
  • db.users.find( { "$text" : { "$search" : "LEONEL" } } ) => FOUND
  • db.users.find( { "$text" : { "$search" : "leonel" } } ) => FOUND (search caseSensitive is false)
  • db.users.find( { "$text" : { "$search" : "LEONÉL" } } ) => FOUND (search with diacriticSensitive is false)
  • db.users.find( { "$text" : { "$search" : "LEONE" } } ) => FOUND (Partial search)
  • db.users.find( { "$text" : { "$search" : "LEO" } } ) => NOT FOUND (Partial search)
  • db.users.find( { "$text" : { "$search" : "L" } } ) => NOT FOUND (Partial search)

知道为什么我使用查询LEO"得到 0 个结果吗?还是L"?

Any idea why I get 0 results using as query "LEO" or "L"?

不允许使用带有文本索引搜索的正则表达式.

Regex with Text Index Search is not allowed.

db.getCollection('users')
     .find( { "$text" : { "$search" : "/LEO/i", 
                          "$caseSensitive": false, 
                          "$diacriticSensitive": false }} )
     .count() // 0 results

db.getCollection('users')
     .find( { "$text" : { "$search" : "LEO", 
                          "$caseSensitive": false, 
                          "$diacriticSensitive": false }} )
.count() // 0 results


MongoDB 文档:

推荐答案

在 MongoDB 3.4 中, 文本搜索 功能旨在支持使用特定语言的停用词和词干规则对文本内容进行不区分大小写的搜索.支持的语言的词干规则基于标准算法,通常处理常见的动词和名词,但不知道专有名词.

As at MongoDB 3.4, the text search feature is designed to support case-insensitive searches on text content with language-specific rules for stopwords and stemming. Stemming rules for supported languages are based on standard algorithms which generally handle common verbs and nouns but are unaware of proper nouns.

没有明确支持部分匹配或模糊匹配,但具有相似结果的术语似乎是这样工作的.例如:taste"、tastes"和tasteful 都是tast"的词干.试试Snowball Stemming Demo 页面来试验更多的词和词干算法.

There is no explicit support for partial or fuzzy matches, but terms that stem to a similar result may appear to be working as such. For example: "taste", "tastes", and tasteful" all stem to "tast". Try the Snowball Stemming Demo page to experiment with more words and stemming algorithms.

匹配的结果都是同一个词LEONEL"的变体,仅因大小写和变音符号而异.除非您选择的语言规则可以将LEONEL"截取为更短的词干,否则这些是唯一匹配的变体类型.

Your results that match are all variations on the same word "LEONEL", and vary only by case and diacritic. Unless "LEONEL" can be stemmed to something shorter by the rules of your selected language, these are the only type of variations that will match.

如果您想进行高效的部分匹配,则需要采用不同的方法.有关一些有用的想法,请参阅:

If you want to do efficient partial matches you'll need to take a different approach. For some helpful ideas see:

  • Efficient Techniques for Fuzzy and Partial matching in MongoDB by John Page
  • Efficient Partial Keyword Searches by James Tan

您可以在 MongoDB 问题跟踪器中观看/投票支持相关的改进请求:SERVER-15090: 改进文本索引以支持部分词匹配.

There is a relevant improvement request you can watch/upvote in the MongoDB issue tracker: SERVER-15090: Improve Text Indexes to support partial word match.

这篇关于MongoDB 全文和部分文本搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆