如何查找具有两个键值范围内的查询值的文档 [英] How to find documents having a query-value within the range of two key-values

查看:171
本文介绍了如何查找具有两个键值范围内的查询值的文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在分析文字。那些文本有注释(例如章节,风景,......)。这些注释在我的MongoDB集合注释中,例如

I'm analyzing texts. Those texts have annotations (e.g. "chapter", "scenery", ...). Those annotations are in my MongoDB collection annotations, e.g.

{
  start: 1,
  stop: 10000,
  type: chapter,
  details: {
    number: 1,
    title: "Where it all began"
  }
},
{
  start: 10001,
  stop: 20000,
  type: chapter,
  details: {
    number: 2,
    title: "Lovers"
  }
},
{
  start: 1,
  stop: 5000,
  type: scenery,
  details: {
    descr: "castle"
  }
},
{
  start: 5001,
  stop: 15000,
  type: scenery,
  details: {
    descr: "forest"
  }
}

挑战1 :对于文本中的给定位置,我想查找所有注释。例如查询字符 1234 应该告诉我,

Challenge 1: For a given position in the text, I'd like find all annotations. For example querying for character 1234 should tell me, that


  • 它在章节内一个

  • 它发生在城堡里

挑战2 :我也想查询范围。例如查询字符 9800到10101 应告诉我,它触及第1章第2章风景林

Challenge 2: I also like to query for ranges. For example querying for characters form 9800 to 10101 should tell me, that it touches chapter 1, chapter 2 and the scenery forest.

挑战3 :与 challenge 2 相比我只想匹配查询范围完全覆盖的那些注释。例如,查询字符 9800到30000 应该只返回文件第2章

Challenge 3: Comparable to challenge 2 I'd like to match only those annotations that are completely covered by the query-range. For example querying for characters form 9800 to 30000 should only return the document chapter 2.

对于挑战1 我试图只使用 $ lt $ gt 。例如:

For challenge 1 I tried to simply use $lt and $gt. e.g.:

db.annotations.find({start: {$lt: 1234}, stop: {$gt: 1234}});

但我意识到,只有键的索引开始开始的复合索引停止,也会使用$ c>。有没有办法为我提到的三个问题创建更合适的索引?

But I realized, that only indexes for the key start is used, even if I have a compound index for start and stop. Is there a way to create more adequate indexes for the three problems I mentioned?

我很快就想到了地理空间索引,但我还没有使用它们。我也只需要它的一维版本。

I shortly thought of geospatial indexes, but I haven't used them, yet. I also only need a one-dimensional version of it.

推荐答案

对于挑战1 ,查询您使用的是合适的,但您可能希望使用 $ lte $ gte 来包容。

For Challenge 1, the query you are using is appropriate, though you might want to use $lte and $gte to be inclusive.

db.annotations.find({ "start": { "$lt": 1234 }, "stop": { "$gt": 1234 }});

关于索引,它选择在开始使用索引的原因而不是复合索引与复合索引的树结构有关,Rob Moore在这个中很好地解释了这一点。回答。请注意,如果使用 hint(),它仍然可以使用复合索引,但查询优化器发现使用 start 然后清除与 stop 子句的范围不匹配的结果。

Regarding indexes, the reason it chooses to use the index on start instead of the compound index has to do with the tree structure of compound indexes, which is nicely explained by Rob Moore in this answer. Note that it can still use the compound index if you use hint(), but the query optimiser finds it faster to use the index on start and then weed out the results that don't match the range for the stop clause.

对于挑战2 ,您只需使用显式 $或子句来涵盖停止 start 在界限内并且 start 在界限范围内> stop 包含边界。

For Challenge 2, you just need to use an explicit $or clause to cover the cases where stop is within the bounds, when start is within the bounds and when start and stop encompass the bounds.

db.annotations.find({
    "$or": [
        { "stop": { "$gte": 9800, "$lte": 10101 }},
        { "start": { "$gte": 9800, "$lte": 10101 }},
        { "start": { "$lt": 9800 }, "stop": { "$gt": 10101 }}
    ]
});

对于挑战3 ,您可以使用非常类似的查询在挑战1 中,但确保文件完全由给定边界覆盖。

For Challenge 3, you can use a query very similar to the one in Challenge 1, but ensuring that the documents are completely covered by the given bounds.

db.annotations.find({ "start": { "$gte": 9800 }, "stop": { "$lte": 30000 }});

这篇关于如何查找具有两个键值范围内的查询值的文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆