使用$ or运算符时MongoDB查询变慢 [英] MongoDB query to slow when using $or operator
问题描述
我正在尝试对我的收藏集 Audios
I'm trying to make this query to my collection Audios
var querySlow = {
"palabra": {
$regex: "^" + keywords,
"$options": "i"
},
$or: [{
"_p_pais": {
$in: interested_accents
}
}, {
"languageCodeTatoeba": {
$in: interested_accents_tatoeba
}
}]
}; // takes 20 seconds
这实际上真的很慢,但是如果我删除任何 $ or
,则它非常快,例如:
This is actually really really slow but if I remove any of the $or
, it is very very fast, for example:
var queryFast1 = {
"palabra": {
$regex: "^" + keywords,
"$options": "i"
},
$or: [{
"_p_pais": {
$in: interested_accents
}
}]
}; // takes less than 1 second
或这个
var queryFast2 = {
"palabra": {
$regex: "^" + keywords,
"$options": "i"
},
$or: [{
"languageCodeTatoeba": {
$in: interested_accents_tatoeba
}
}]
}; // takes less than 1 second
这是慢查询的 .explain()
:
我实际上不知道如何管理索引,我应该为此集合创建索引吗?
I actually don't know how to manage the indexes, should I create an index to this collection??
推荐答案
查询和索引存在一些问题:
There are some issues with your query and indexes:
1.$ or使用不同的索引
MongoDB仅对一个查询使用一个索引,但涉及到 $ or
子句的查询除外.在索引策略页面:
MongoDB only uses one index for a query, with the exception of queries involving an $or
clause. From the Indexing Strategies page:
通常,MongoDB仅使用一个索引来满足大多数查询.但是,$ or查询的每个子句都可以使用不同的索引
Generally, MongoDB only uses one index to fulfill most queries. However, each clause of an $or query may use a different index
也来自 $ or子句和索引页面:
也就是说,为了使MongoDB使用索引来评估$ or表达式,$ or表达式中的所有子句都必须由索引支持.
That is, for MongoDB to use indexes to evaluate an $or expression, all the clauses in the $or expression must be supported by indexes.
关于您的查询,您可以尝试重新排列查询以使 $ or
子句成为顶级子句:
With regard to your query, you could try to rearrange the query to make the $or
clause a top-level clause:
{$or: [
{"palabra": {...}, "_p_pais": {...} },
{"palabra": {...}, "languageCodeTatoeba": {...}}
]}
在这种形式下,MongoDB可以使用两个索引:
In this form, MongoDB can use two indexes:
- 具有
palabra
和_p_pais
术语的复合索引,以及 - 具有
palabra
和languageCodeTatoeba
条款的复合索引
- Compound index with
palabra
and_p_pais
terms, and - Compound index with
palabra
andlanguageCodeTatoeba
terms
请使用 explain("executionStats")
检查索引是否正确使用.您要最小化的关键指标是文档数( nReturned
)与已检查的总文档数/键数.该比率越接近1,您的查询就越有选择性,并且性能也越好.
Please use explain("executionStats")
to check if the indexes are used correctly. The key metric you want to minimize is the number of documents (nReturned
) vs. total docs/keys examined. The closer the ratio is to 1, the more selective your query is, and the better the performance.
例如,如果MongoDB必须检查1000个文档( totalDocsExamined:1000
),但仅返回10个文档( nReturned:10
),那么您的查询不是选择性的(即比率为10/1000).理想查询的比率应接近或等于1,例如 nReturned:10,totalDocsExamined:10
,比率为1(10/10).
For example, if MongoDB has to examine 1000 docs (totalDocsExamined: 1000
), but only return 10 document (nReturned: 10
), then your query is not very selective (i.e. a ratio of 10/1000). Ideal queries would have a ratio close to or equal to 1, e.g. nReturned: 10, totalDocsExamined: 10
, a ratio of 1 (10/10).
有关 explain()
的更多信息,请参见:
For more information regarding explain()
, please see:
2.索引太多
索引过多可能导致:
- 查询计划者选择次优索引,因为它们看起来都一样,因此不知道要使用哪个索引.
- 插入/更新性能相对较慢,因为对索引中包含的字段的每次插入/更新也将需要对索引进行插入/更新.
从您发布的解释结果中,您至少在集合中具有以下索引:
From the explain result you posted, you have at least these indexes in the collection:
_p_pais_-1__p_user_-1__created_at_-1
languageCodeTatoeba_1_lowercase_1
languageCodeTatoeba_1
languageCodeTatoeba_-1
_p_pais_-1
_p_pais_1_languageCodeTatoeba_1
palabra_-1
palabra_1__created_at_-1
这组索引有两个问题:
- 在索引中,有些是多余的.例如,
languageCodeTatoeba_1
(升序索引)和languageCodeTatoeba_-1
(降序索引)实际上是相同的索引.可以删除其中之一,而不会影响查询性能. - 很多索引是另一个的前缀.例如,
palabra_-1
和palabra_1__created_at _
.可以删除palabra_-1
索引,因为它是palabra_1__created_at _
索引的前缀.有关更多详细信息,请参见化合物索引:前缀页./li>
- Among the indexes, some are redundant. For example,
languageCodeTatoeba_1
(an ascending index) andlanguageCodeTatoeba_-1
(a descending index) are practically the same index. One of them can be removed without any effect on query performance. - A lot of indexes are prefix of another. For example,
palabra_-1
andpalabra_1__created_at_
. Thepalabra_-1
index can be removed, since it is the prefix of thepalabra_1__created_at_
index. Please see the Compound Index: Prefix page for more details.
粗略地看一下,您也许可以将索引列表修整为仅包含这4个索引,而不是8个:
From a cursory glance, you may be able to trim your index list to only contain these 4 indexes instead of 8:
_p_pais_-1__p_user_-1__created_at_-1
languageCodeTatoeba_1_lowercase_1
_p_pais_1_languageCodeTatoeba_1
palabra_1__created_at_-1
有关索引的更多信息,请参见以下链接:
Please see the following links for more information regarding indexes:
3.为什么从 $ or
词中删除一个子句会加快查询速度
3. Why removing one clause from the $or
term speeds up the queries
这是因为查询
{"palabra": {...}, $or: [{"_p_pais": {...}}]}
本质上与
{"palabra": {...}, "_p_pais": {...}}
假设您有一个复合索引,例如 palabra_1__p_pais_1
,MongoDB将能够使用该索引.
Assuming you have a compound index such as palabra_1__p_pais_1
, MongoDB would be able to use that index.
类似地,
{"palabra": {...}, $or: [{"languageCodeTatoeba": {...}}]}
本质上与
{"palabra": {...}, "languageCodeTatoeba": {...}}
此查询可以使用您已经在集合中的 _p_pais_1_languageCodeTatoeba_1
索引.
This query could use the _p_pais_1_languageCodeTatoeba_1
index, which you already have in your collection.
简而言之,这两个查询很快,因为您删除了 $ or
子句,从而使MongoDB可以使用正确的索引.
In short, those two queries are fast because you removed the $or
clause, enabling MongoDB to use the correct index.
这篇关于使用$ or运算符时MongoDB查询变慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!