带有$ sample的MongoDB聚合非常慢 [英] MongoDB Aggregation with $sample very slow
问题描述
有很多方法可以从mongodb集合中选择随机文档(如在此答案中讨论的 ).注释指出,使用mongodb版本> = 3.2,然后使用 $sample
聚合框架中的a>是首选.但是,在包含许多小文档的集合中,这似乎非常慢.
There are many ways to select random document from a mongodb collection (as discussed in this answer). Comments point out that with mongodb version >= 3.2 then using $sample
in the aggregation framework is preferred. However, on a collection with many small documents this seems to extremely slow.
以下代码使用mongoengine模拟问题并将其与跳过随机"方法进行比较:
The following code uses mongoengine to simulate the issue and compare it to the "skip random" method:
import timeit
from random import randint
import mongoengine as mdb
mdb.connect("test-agg")
class ACollection(mdb.Document):
name = mdb.StringField(unique=True)
meta = {'indexes': ['name']}
ACollection.drop_collection()
ACollection.objects.insert([ACollection(name="Document {}".format(n)) for n in range(50000)])
def agg():
doc = list(ACollection.objects.aggregate({"$sample": {'size': 1}}))[0]
print(doc['name'])
def skip_random():
n = ACollection.objects.count()
doc = ACollection.objects.skip(randint(1, n)).limit(1)[0]
print(doc['name'])
if __name__ == '__main__':
print("agg took {:2.2f}s".format(timeit.timeit(agg, number=1)))
print("skip_random took {:2.2f}s".format(timeit.timeit(skip_random, number=1)))
结果是:
Document 44551
agg took 21.89s
Document 25800
skip_random took 0.01s
过去,无论我在mongodb上遇到性能问题,我的答案始终是使用聚合框架,因此,我感到惊讶的是$sample
这么慢.
Wherever I've had performance issues with mongodb in the past my answer has always been to use the aggregation framework so I'm surprised $sample
is so slow.
我在这里错过了什么吗?这个示例导致聚合花费这么长时间是什么原因?
Am I missing something here? What is it about this example that is causing the aggregation to take so long?
推荐答案
这是升级到最新版本应该可以解决此问题.
This is a result of a known bug in the WiredTiger engine in versions of mongodb < 3.2.3. Upgrading to the latest version should solve this.
这篇关于带有$ sample的MongoDB聚合非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!