Mongodb 中全文索引居然比正则慢

查看：159 发布时间：2017/9/5 23:03:56

本文介绍了Mongodb 中全文索引居然比正则慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

问题

版本：

version:
3.4.3

总的数据量百万级别
创建全文索引：

db.tests.createIndex({'a':'text'}) # a 字段的值很大超过 1024 个字符

查询

db.tests.find({'$text':{'$search':'ere'}).explain('executionStats')

"executionStats" : {
        "executionSuccess" : true,
        "nReturned" : 80018,
        "executionTimeMillis" : 306877,
        "totalKeysExamined" : 83546,
        "totalDocsExamined" : 83546,
        "executionStages" : {
            "stage" : "TEXT",
            "nReturned" : 80018,
            "executionTimeMillisEstimate" : 306699,
            "works" : 167095,
            "advanced" : 80018,
            "needTime" : 87076,
            "needYield" : 0,
            "saveState" : 16525,
            "restoreState" : 16525,
            "isEOF" : 1,
            "invalidates" : 0,
            "indexPrefix" : {
                
            },
            "indexName" : "a_text",
            "parsedTextQuery" : {
                "terms" : [
                    "ii"
                ],
                "negatedTerms" : [ ],
                "phrases" : [
                    "ere"
                ],
                "negatedPhrases" : [ ]
            },
            "textIndexVersion" : 3,
            "inputStage" : {
                "stage" : "TEXT_MATCH",
                "nReturned" : 80018,
                "executionTimeMillisEstimate" : 306649,
                "works" : 167095,
                "advanced" : 80018,
                "needTime" : 87076,
                "needYield" : 0,
                "saveState" : 16525,
                "restoreState" : 16525,
                "isEOF" : 1,
                "invalidates" : 0,
                "docsRejected" : 3528,
                "inputStage" : {
                    "stage" : "TEXT_OR",
                    "nReturned" : 83546,
                    "executionTimeMillisEstimate" : 305932,
                    "works" : 167095,
                    "advanced" : 83546,
                    "needTime" : 83548,
                    "needYield" : 0,
                    "saveState" : 16525,
                    "restoreState" : 16525,
                    "isEOF" : 1,
                    "invalidates" : 0,
                    "docsExamined" : 83546,
                    "inputStage" : {
                        "stage" : "IXSCAN",
                        "nReturned" : 83546,
                        "executionTimeMillisEstimate" : 1103,
                        "works" : 83547,
                        "advanced" : 83546,
                        "needTime" : 0,
                        "needYield" : 0,
                        "saveState" : 16525,
                        "restoreState" : 16525,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "keyPattern" : {
                            "_fts" : "text",
                            "_ftsx" : 1
                        },
                        "indexName" : "a_text",
                        "isMultiKey" : true,
                        "isUnique" : false,
                        "isSparse" : false,
                        "isPartial" : false,
                        "indexVersion" : 2,
                        "direction" : "backward",
                        "indexBounds" : {
                            
                        },
                        "keysExamined" : 83546,
                        "seeks" : 1,
                        "dupsTested" : 83546,
                        "dupsDropped" : 0,
                        "seenInvalidated" : 0
                    }

不是很理解，为什么需要做 TEXT_OR 和 TEXT_MATCH 的操作？

理想状态下，全文索引应该比正则快，要不然还要全文索引干什么，还是上面的查询条件使用正则（无论存不存在 b_text 索引，避免热数据重启 mongo ）都是一样的结果

db.tests.find({'a':{'$regex':'ere','$options':'i'}}).explain('executionStats')

"executionStats" : {
        "executionSuccess" : true,
        "nReturned" : 81319,
        "executionTimeMillis" : 101701,
        "totalKeysExamined" : 0,
        "totalDocsExamined" : 4256954123,
        "executionStages" : {
            "stage" : "COLLSCAN",
            "filter" : {
                "a" : {
                    "$regex" : "ere",
                    "$options" : "i"
                }
            },
            "nReturned" : 81319,
            "executionTimeMillisEstimate" : 101391,
            "works" : 4256956,
            "advanced" : 81319,
            "needTime" : 4175636,
            "needYield" : 0,
            "saveState" : 33964,
            "restoreState" : 33964,
            "isEOF" : 1,
            "invalidates" : 0,
            "direction" : "forward",
            "docsExamined" : 4256954
        }

是不是我打开的方式不对呢？麻烦给指点一下。谢谢

解决方案

1、text search中出现text_or,text_match是正常的。

2、在您的regex查询由于是大小写不敏感，所以没有用到索引，而是走的全表扫描

所以，您比较出来的结果，还需要进一步看看：

1、text索引的大小;

2、collection的大小;

如果text索引的大小比collection的大小要大的话，可能这里不太适合用text index吧。

另外，文档中提到

If you specify a language value of "none", then the text search uses
simple tokenization with no list of stop words and no stemming.

您查询的‘ere’是没有什么意义，尝试设置$language : none，关闭stemming之类等，看是否性能好一些吧。

供参考。

Love MongoDB! Have fun!

这篇关于Mongodb 中全文索引居然比正则慢的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Mongodb 中全文索引居然比正则慢

问题描述

不是很理解，为什么需要做 TEXT_OR 和 TEXT_MATCH 的操作？

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Mongodb 中全文索引居然比正则慢

问题描述

不是很理解，为什么需要做 TEXT_OR 和 TEXT_MATCH 的操作？

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭