MongoDB - 文本字段索引和文本索引之间的区别? [英] MongoDB - Difference between index on text field and text index?

查看:38
本文介绍了MongoDB - 文本字段索引和文本索引之间的区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于包含字符串(例如,州或省名称)的 MongoDB 字段,在字符串类型字段上创建索引之间有什么(如果有)区别:

For a MongoDB field that contains strings (for example, state or province names), what (if any) difference is there between creating an index on a string-type field :

db.ensureIndex( { field: 1 } )

并在该字段上创建文本索引:

and creating a text index on that field:

db.ensureIndex( { field: "text" }

在这两种情况下,field 都是 string 类型.

Where, in both cases, field is of string type.

我正在寻找一种对包含单个单词(可能更多)的文本字段进行不区分大小写搜索的方法.作为 Mongo 的新手,我无法区分使用上述两种索引方法,甚至是诸如 $regex 搜索之类的东西.

I'm looking for a way to do a case-insensitive search on a text field which would contain a single word (maybe more). Being new to Mongo, I'm having trouble distinguishing between using the above two index methods, and even something like a $regex search.

推荐答案

这两个索引选项非常不同.

The two index options are very different.

  • 当您在字符串字段上创建常规索引时,它会索引字符串中的整个值.主要用于单个单词字符串(例如用于登录的用户名)您可以完全匹配.

  • When you create a regular index on a string field it indexes the entire value in the string. Mostly useful for single word strings (like a username for logins) where you can match exactly.

另一方面,文本索引将标记和词干场.所以它会将字符串分解为单独的单词或令牌,并将进一步将它们减少到它们的词干,以便变体相同单词的将匹配(talk"匹配talks"、talked"和例如,谈话",因为谈话"是所有三个词的词干).大多对真实文本(句子、段落等)很有用.

A text index on the other hand will tokenize and stem the content of the field. So it will break the string into individual words or tokens, and will further reduce them to their stems so that variants of the same word will match ("talk" matching "talks", "talked" and "talking" for example, as "talk" is a stem of all three). Mostly useful for true text (sentences, paragraphs, etc).

文本搜索

文本搜索支持在一个文档中搜索字符串内容收藏.MongoDB 提供了 $text 运算符来执行文本搜索在查询和聚合管道中.

Text search supports the search of string content in documents of a collection. MongoDB provides the $text operator to perform text search in queries and in aggregation pipelines.

文本搜索过程:

tokenizes and stems the search term(s) during both the index creation and the text command execution.
assigns a score to each document that contains the search term in the indexed fields. The score determines the relevance of a document to a given search query.

$text 运算符可以搜索单词和短语.查询匹配在完整的词干上.例如,如果一个文档字段包含单词 blueberry,搜索词 blue 将不匹配文件.但是,搜索 blueberry 或 blueberries会匹配.

The $text operator can search for words and phrases. The query matches on the complete stemmed words. For example, if a document field contains the word blueberry, a search on the term blue will not match the document. However, a search on either blueberry or blueberries will match.

  • $regex 搜索可以与字符串字段上的常规索引一起使用,以提供一些模式匹配和通配符搜索.不是很可怕索引的有效用户,但它会尽可能使用索引:

  • $regex searches can be used with regular indexes on string fields, to provide some pattern matching and wildcard search. Not a terribly effective user of indexes but it will use indexes where it can:

    如果该字段存在索引,则 MongoDB 匹配正则针对索引中的值的表达式,这可能比 a集合扫描.如果常规的可以进行进一步的优化表达式是一个前缀表达式",这意味着所有潜在的匹配以相同的字符串开始.这允许 MongoDB 构建一个来自该前缀的范围",并且只匹配来自位于该范围内的索引.

    If an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan. Further optimization can occur if the regular expression is a "prefix expression", which means that all potential matches start with the same string. This allows MongoDB to construct a "range" from that prefix and only match against those values from the index that fall within that range.

  • http://docs.mongodb.org/manual/core/index-text/

    http://docs.mongodb.org/manual/reference/operator/query/regex/

    这篇关于MongoDB - 文本字段索引和文本索引之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆