如何查找具有两个键值范围内的查询值的文档 [英] How to find documents having a query-value within the range of two key-values
问题描述
我在分析文字。那些文本有注释(例如章节,风景,......)。这些注释在我的MongoDB集合注释
中,例如
I'm analyzing texts. Those texts have annotations (e.g. "chapter", "scenery", ...). Those annotations are in my MongoDB collection annotations
, e.g.
{
start: 1,
stop: 10000,
type: chapter,
details: {
number: 1,
title: "Where it all began"
}
},
{
start: 10001,
stop: 20000,
type: chapter,
details: {
number: 2,
title: "Lovers"
}
},
{
start: 1,
stop: 5000,
type: scenery,
details: {
descr: "castle"
}
},
{
start: 5001,
stop: 15000,
type: scenery,
details: {
descr: "forest"
}
}
挑战1 :对于文本中的给定位置,我想查找所有注释。例如查询字符 1234
应该告诉我,
Challenge 1: For a given position in the text, I'd like find all annotations. For example querying for character 1234
should tell me, that
- 它在章节内一个
- 它发生在城堡里
挑战2 :我也想查询范围。例如查询字符 9800到10101
应告诉我,它触及第1章
,第2章
和风景林
。
Challenge 2: I also like to query for ranges. For example querying for characters form 9800 to 10101
should tell me, that it touches chapter 1
, chapter 2
and the scenery forest
.
挑战3 :与 challenge 2 相比我只想匹配查询范围完全覆盖的那些注释。例如,查询字符 9800到30000
应该只返回文件第2章
。
Challenge 3: Comparable to challenge 2 I'd like to match only those annotations that are completely covered by the query-range. For example querying for characters form 9800 to 30000
should only return the document chapter 2
.
对于挑战1 我试图只使用 $ lt
和 $ gt
。例如:
For challenge 1 I tried to simply use $lt
and $gt
. e.g.:
db.annotations.find({start: {$lt: 1234}, stop: {$gt: 1234}});
但我意识到,只有键的索引开始$ c即使我有
开始的复合索引
和停止
,也会使用$ c>。有没有办法为我提到的三个问题创建更合适的索引?
But I realized, that only indexes for the key start
is used, even if I have a compound index for start
and stop
. Is there a way to create more adequate indexes for the three problems I mentioned?
我很快就想到了地理空间索引,但我还没有使用它们。我也只需要它的一维版本。
I shortly thought of geospatial indexes, but I haven't used them, yet. I also only need a one-dimensional version of it.
推荐答案
对于挑战1 ,查询您使用的是合适的,但您可能希望使用 $ lte
和 $ gte
来包容。
For Challenge 1, the query you are using is appropriate, though you might want to use $lte
and $gte
to be inclusive.
db.annotations.find({ "start": { "$lt": 1234 }, "stop": { "$gt": 1234 }});
关于索引,它选择在开始使用索引的原因
而不是复合索引与复合索引的树结构有关,Rob Moore在这个中很好地解释了这一点。回答。请注意,如果使用 hint()
,它仍然可以使用复合索引,但查询优化器发现使用 start
然后清除与 stop
子句的范围不匹配的结果。
Regarding indexes, the reason it chooses to use the index on start
instead of the compound index has to do with the tree structure of compound indexes, which is nicely explained by Rob Moore in this answer. Note that it can still use the compound index if you use hint()
, but the query optimiser finds it faster to use the index on start
and then weed out the results that don't match the range for the stop
clause.
对于挑战2 ,您只需使用显式 $或
子句来涵盖停止$的情况当$ code> start
在界限内并且 start
和在界限范围内> stop
包含边界。
For Challenge 2, you just need to use an explicit $or
clause to cover the cases where stop
is within the bounds, when start
is within the bounds and when start
and stop
encompass the bounds.
db.annotations.find({
"$or": [
{ "stop": { "$gte": 9800, "$lte": 10101 }},
{ "start": { "$gte": 9800, "$lte": 10101 }},
{ "start": { "$lt": 9800 }, "stop": { "$gt": 10101 }}
]
});
对于挑战3 ,您可以使用非常类似的查询在挑战1 中,但确保文件完全由给定边界覆盖。
For Challenge 3, you can use a query very similar to the one in Challenge 1, but ensuring that the documents are completely covered by the given bounds.
db.annotations.find({ "start": { "$gte": 9800 }, "stop": { "$lte": 30000 }});
这篇关于如何查找具有两个键值范围内的查询值的文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!