MongoDB - 文本字段索引和文本索引之间的区别? [英] MongoDB - Difference between index on text field and text index?

查看:114
本文介绍了MongoDB - 文本字段索引和文本索引之间的区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于包含字符串的MongoDB字段(例如,州名或省名),在字符串类型字段上创建索引之间存在什么(如果有)差异:

For a MongoDB field that contains strings (for example, state or province names), what (if any) difference is there between creating an index on a string-type field :

db.ensureIndex( { field: 1 } )

并在该字段上创建文本索引:

and creating a text index on that field:

db.ensureIndex( { field: "text" }

在这两种情况下,字段的格式为 string type。

Where, in both cases, field is of string type.

我正在寻找一种方法,对文本字段进行不区分大小写的搜索,其中包含单词(可能更多)。作为Mongo的新手,我无法区分使用上述两种索引方法,甚至是 $ regex 搜索。

I'm looking for a way to do a case-insensitive search on a text field which would contain a single word (maybe more). Being new to Mongo, I'm having trouble distinguishing between using the above two index methods, and even something like a $regex search.

推荐答案

两个索引选项非常不同。

The two index options are very different.


  • 在字符串字段上创建常规索引时,它会为字符串中的
    整数值编制索引。非常有用对于单个单词字符串
    (就像登录用户名一样),你可以准确匹配。

  • When you create a regular index on a string field it indexes the entire value in the string. Mostly useful for single word strings (like a username for logins) where you can match exactly.

另一个硬文本的文本索引会标记化和干字段
的内容。因此它会将字符串分解为单个单词或
令牌,并将进一步将它们减少到它们的词干,以便相同单词的变体
匹配(talk匹配talk,talked和例如,
说话,因为谈话是所有三个的主干。大多数
对真文本(句子,段落等)很有用。

A text index on the other hard will tokenize and stem the content of the field. So it will break the string into individual words or tokens, and will further reduce them to their stems so that variants of the same word will match ("talk" matching "talks", "talked" and "talking" for example, as "talk" is a stem of all three). Mostly useful for true text (sentences, paragraphs, etc).


文本搜索

文本搜索支持在
集合的文档中搜索字符串内容。 MongoDB提供了 $ text 运算符,用于在查询和聚合管道中执行文本搜索

Text search supports the search of string content in documents of a collection. MongoDB provides the $text operator to perform text search in queries and in aggregation pipelines.

文本搜索过程:

tokenizes and stems the search term(s) during both the index creation and the text command execution.
assigns a score to each document that contains the search term in the indexed fields. The score determines the relevance of a document to a given search query.

$ text 运算符可以搜索单词和短语。查询在完整的词干上匹配
。例如,如果文档字段
包含单词blueberry,则对术语blue的搜索将与文档的
不匹配。但是,搜索蓝莓或蓝莓
将匹配。

The $text operator can search for words and phrases. The query matches on the complete stemmed words. For example, if a document field contains the word blueberry, a search on the term blue will not match the document. However, a search on either blueberry or blueberries will match.


  • $ regex 搜索可以与字符串字段上的常规索引一起使用,
    提供一些模式匹配和通配符搜索。不是一个非常
    的有效索引用户,但它会使用索引:

  • $regex searches can be used with regular indexes on string fields, to provide some pattern matching and wildcard search. Not a terribly effective user of indexes but it will use indexes where it can:


    如果该字段存在索引,然后MongoDB将常规
    表达式与索引中的值匹配,这可能比
    集合扫描更快。如果常规
    表达式是前缀表达式,则可以进一步优化,这意味着所有潜在的
    匹配以相同的字符串开头。这允许MongoDB从该前缀构造一个
    范围,并且只与那个落在该范围内的
    索引中的那些值匹配。

    If an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan. Further optimization can occur if the regular expression is a "prefix expression", which means that all potential matches start with the same string. This allows MongoDB to construct a "range" from that prefix and only match against those values from the index that fall within that range.


  • http://docs.mongodb.org/manual/core/index-text/

    http://docs.mongodb.org/manual/reference/operator/query/regex/

    这篇关于MongoDB - 文本字段索引和文本索引之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆