$存在的最佳复合索引:true(稀疏索引) [英] Optimal Compound Indexes for $exists : true (sparse indexes)

查看:88
本文介绍了$存在的最佳复合索引:true(稀疏索引)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要加快这种查询的速度:

I need to speedup this kind of query:

db.col.find({ a: "foobar", b: { $exists: true} });

数据分发

字段的存在:

  • 所有文档中存在字段a
  • 字段b仅存在于其中的〜10%.
  • Data Distribution

    Existence of fields:

    • The field a exists in all documents,
    • The field b exists only in ~10% of them.
    • db.col.count() // 1,050,505
      db.col.count({ a : "foobar" }) // 517.967
      db.col.count({ a : "foobar", b : { $exists: true} }) // 44.922
      db.col.count({ b : { $exists: true} }) // 88.981
      

      真实的数据增长:

      到目前为止,已装入两个批次(约50万的2倍). 每个月将添加另外约500,000个文档. a字段是此批次的名称.这些新添加的文档将具有相同的字段分布(大约10%的新加载文档将具有b字段)

      Futrue data growth:

      So far two batches where loaded (2x around 500,000). Each month another batch of ~500,000 documents will be added. The a field is the name of this batch. Those newly added documents will have the same distribution of fields (around 10% of the newly loaded documents will have the b field)

      我在{a:1, b:1}上创建了一个稀疏索引,但是由于a出现在所有文档中,因此无法加快索引速度.那是因为MongoDB中稀疏索引的行为.从文档:

      I created a sparse index on {a:1, b:1} but because a is present in all documents, that doesn't speed it up. Thats because the behaviour of sparse indexes in MongoDB. From the docs:

      仅包含升序/降序索引键的稀疏复合索引将对文档建立索引,只要该文档包含至少一个键即可.

      Sparse compound indexes that only contain ascending/descending index keys will index a document as long as the document contains at least one of the keys.

      这是上一个查询的.explain():

      {
          "queryPlanner" : {
              "plannerVersion" : 1,
              "namespace" : "myCol",
              "indexFilterSet" : false,
              "parsedQuery" : {
                  "$and" : [ 
                      {
                          "a" : {
                              "$eq" : "foobar"
                          }
                      }, 
                      {
                          "b" : {
                              "$exists" : true
                          }
                      }
                  ]
              },
              "winningPlan" : {
                  "stage" : "KEEP_MUTATIONS",
                  "inputStage" : {
                      "stage" : "FETCH",
                      "filter" : {
                          "b" : {
                              "$exists" : true
                          }
                      },
                      "inputStage" : {
                          "stage" : "IXSCAN",
                          "keyPattern" : {
                              "a" : 1,
                              "b" : 1
                          },
                          "indexName" : "a_1_b_1",
                          "isMultiKey" : false,
                          "direction" : "forward",
                          "indexBounds" : {
                              "a" : [ 
                                  "[\"foobar\", \"foobar\"]"
                              ],
                              "b" : [ 
                                  "[MinKey, MaxKey]"
                              ]
                          }
                      }
                  }
              },
              "rejectedPlans" : []
          },
          "executionStats" : {
              "executionSuccess" : true,
              "nReturned" : 44922,
              "executionTimeMillis" : 208656,
              "totalKeysExamined" : 517967,
              "totalDocsExamined" : 517967,
              "executionStages" : {
                  "stage" : "KEEP_MUTATIONS",
                  "nReturned" : 44922,
                  "executionTimeMillisEstimate" : 180672,
                  "works" : 550772,
                  "advanced" : 44922,
                  "needTime" : 473045,
                  "needFetch" : 32804,
                  "saveState" : 41051,
                  "restoreState" : 41051,
                  "isEOF" : 1,
                  "invalidates" : 0,
                  "inputStage" : {
                      "stage" : "FETCH",
                      "filter" : {
                          "b" : {
                              "$exists" : true
                          }
                      },
                      "nReturned" : 44922,
                      "executionTimeMillisEstimate" : 180612,
                      "works" : 550772,
                      "advanced" : 44922,
                      "needTime" : 473045,
                      "needFetch" : 32804,
                      "saveState" : 41051,
                      "restoreState" : 41051,
                      "isEOF" : 1,
                      "invalidates" : 0,
                      "docsExamined" : 517967,
                      "alreadyHasObj" : 0,
                      "inputStage" : {
                          "stage" : "IXSCAN",
                          "nReturned" : 517967,
                          "executionTimeMillisEstimate" : 3035,
                          "works" : 517967,
                          "advanced" : 517967,
                          "needTime" : 0,
                          "needFetch" : 0,
                          "saveState" : 41051,
                          "restoreState" : 41051,
                          "isEOF" : 1,
                          "invalidates" : 0,
                          "keyPattern" : {
                              "a" : 1,
                              "b" : 1
                          },
                          "indexName" : "a_1_b_1",
                          "isMultiKey" : false,
                          "direction" : "forward",
                          "indexBounds" : {
                              "a" : [ 
                                  "[\"foobar\", \"foobar\"]"
                              ],
                              "b" : [ 
                                  "[MinKey, MaxKey]"
                              ]
                          },
                          "keysExamined" : 517967, // INFO: I think that this is too much. These are all documents having a:"foobar"
                          "dupsTested" : 0,
                          "dupsDropped" : 0,
                          "seenInvalidated" : 0,
                          "matchTested" : 0
                      }
                  }
              },
              "allPlansExecution" : []
          },
          "serverInfo" : {
              "host" : "productive-mongodb-16",
              "port" : 27000,
              "version" : "3.0.1",
              "gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952"
          }
      }
      

      a存在于所有1,000,000个文档中,其中520,000个具有a:"foobar".在整个集合中,有88,000个具有b字段.

      a exists in all 1,000,000 documents and 520,000 of them have a:"foobar". In the whole collection 88,000 having the b field.

      如何加快查询速度(以便IXSCAN仅返回44k而不是520k)?

      How to speedup my query (so that IXSCAN returns only 44k instead of 520k)?

      推荐答案

      您在这里似乎不了解的是 $exists 不能以任何方式抓住"索引,即使在稀疏的地方也是如此. 正如文档本身所说:

      What you do not seem to be understanding here is that $exists cannot "grab" an index in any way, even where sparse. As the documentation itself says:

      如果索引稀疏会导致查询和排序操作的结果集不完整,MongoDB将不会使用该索引"

      这些页面中给出的示例是{ "$exists": false }查询.但是反向逻辑条件在这里没有任何区别.

      The example given in those pages is an { "$exists": false } query. But the reverse logical condition does not make any difference here.

      为了获得稀疏"索引的全部好处",那么您需要考虑其拥有的数据类型"并进行适当的查询.

      In order to get the "full benefit" of a "sparse" index then you need to consider the "type" of data it holds and query appropriately.

      对于数字,类似:

      db.collection.find({ "a": "foobar", "b": { "$gte": -9999, "$lte": 9999 } })
      

      使用索引,而使用稀疏索引.或基于文本的:

      Which uses an index, and the sparse one. Or for text based:

      db.collection.find({ "a": "foobar", "b": /.+/ })
      

      还将使用稀疏索引,仅查看定义了"b"的那些对象.

      Which will also use the sparse index and only look at those where "b" was defined.

      对于数组",然后小心".除非您执行此操作,否则所查看的值可能是以上之一:

      For "arrays" then "be careful". As the value being looked at is probably one of the above unless you did this:

      db.collection.insert({ "a": 1, "b": [[]] })
      

      那没关系:

      db.ab.find({ "a": 1, "b": { "$type": 4 } })
      

      但出于相同的原因,$exists在这里将无法工作.

      But is not really going to use the "sparse" index either for the same reasons $exists won`t work here.

      因此,您需要了解术语的含义,并适当查询",以便在期望最高性能的情况下使用您创建的索引定义.

      So you need to understand what the terms mean here are, and "query appropriately" in order to use the index definitions that you create if you expect the maximum performance.

      这些都是清晰的示例,您可以自己测试并查看结果是否正确.我确实希望在这些方面使核心文档更加清晰,但是我也知道很多人都在尝试做出贡献(并做出了出色的解释),但是到目前为止,这些都还没有包括在内.

      These are clear examples you can test for yourself and see the results are true. I do wish the core documentation was clearer on these points, but I am also aware many have tried to contribute ( and have produced excellent explainations ) but none of these have been included to date.

      猜猜这就是为什么你在这里问.

      Guess that is why you are asking here.

      这篇关于$存在的最佳复合索引:true(稀疏索引)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆