Mongodb 中全文索引居然比正则慢
问题描述
版本:
version:
3.4.3
总的数据量百万级别
创建全文索引:
db.tests.createIndex({'a':'text'}) # a 字段的值很大超过 1024 个字符
查询
db.tests.find({'$text':{'$search':'ere'}).explain('executionStats')
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 80018,
"executionTimeMillis" : 306877,
"totalKeysExamined" : 83546,
"totalDocsExamined" : 83546,
"executionStages" : {
"stage" : "TEXT",
"nReturned" : 80018,
"executionTimeMillisEstimate" : 306699,
"works" : 167095,
"advanced" : 80018,
"needTime" : 87076,
"needYield" : 0,
"saveState" : 16525,
"restoreState" : 16525,
"isEOF" : 1,
"invalidates" : 0,
"indexPrefix" : {
},
"indexName" : "a_text",
"parsedTextQuery" : {
"terms" : [
"ii"
],
"negatedTerms" : [ ],
"phrases" : [
"ere"
],
"negatedPhrases" : [ ]
},
"textIndexVersion" : 3,
"inputStage" : {
"stage" : "TEXT_MATCH",
"nReturned" : 80018,
"executionTimeMillisEstimate" : 306649,
"works" : 167095,
"advanced" : 80018,
"needTime" : 87076,
"needYield" : 0,
"saveState" : 16525,
"restoreState" : 16525,
"isEOF" : 1,
"invalidates" : 0,
"docsRejected" : 3528,
"inputStage" : {
"stage" : "TEXT_OR",
"nReturned" : 83546,
"executionTimeMillisEstimate" : 305932,
"works" : 167095,
"advanced" : 83546,
"needTime" : 83548,
"needYield" : 0,
"saveState" : 16525,
"restoreState" : 16525,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 83546,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 83546,
"executionTimeMillisEstimate" : 1103,
"works" : 83547,
"advanced" : 83546,
"needTime" : 0,
"needYield" : 0,
"saveState" : 16525,
"restoreState" : 16525,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"_fts" : "text",
"_ftsx" : 1
},
"indexName" : "a_text",
"isMultiKey" : true,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "backward",
"indexBounds" : {
},
"keysExamined" : 83546,
"seeks" : 1,
"dupsTested" : 83546,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
不是很理解,为什么需要做 TEXT_OR 和 TEXT_MATCH 的操作?
理想状态下,全文索引应该比正则快,要不然还要全文索引干什么,还是上面的查询条件使用正则(无论存不存在 b_text 索引,避免热数据重启 mongo )都是一样的结果
db.tests.find({'a':{'$regex':'ere','$options':'i'}}).explain('executionStats')
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 81319,
"executionTimeMillis" : 101701,
"totalKeysExamined" : 0,
"totalDocsExamined" : 4256954123,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"a" : {
"$regex" : "ere",
"$options" : "i"
}
},
"nReturned" : 81319,
"executionTimeMillisEstimate" : 101391,
"works" : 4256956,
"advanced" : 81319,
"needTime" : 4175636,
"needYield" : 0,
"saveState" : 33964,
"restoreState" : 33964,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 4256954
}
是不是我打开的方式不对呢?麻烦给指点一下。谢谢
1、text search中出现text_or,text_match是正常的。
2、在您的regex查询由于是大小写不敏感,所以没有用到索引,而是走的全表扫描
所以,您比较出来的结果,还需要进一步看看:
1、text索引的大小;
2、collection的大小;
如果text索引的大小比collection的大小要大的话,可能这里不太适合用text index吧。
另外,文档中提到
If you specify a language value of "none", then the text search uses
simple tokenization with no list of stop words and no stemming.
您查询的‘ere’是没有什么意义,尝试设置$language : none,关闭stemming之类等,看是否性能好一些吧。
供参考。
Love MongoDB! Have fun!
这篇关于Mongodb 中全文索引居然比正则慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!