MongoDB查找随机数据集性能 [英] MongoDB find random dataset performance

查看:318
本文介绍了MongoDB查找随机数据集性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大约有500000数据集的集合,我想从中找到一个随机的数据集. 我可以将find()限制为客户ID,这会将大小减小到大约80000套.索引也会添加到客户ID中.

I have a collection with about 500000 dataset in it and I like to find a random dataset out of it. I can restrict the find() to the customer-id, which reduces the size to about 80000 sets. Indices are also added to the customer-id.

在PHP中,我使用以下命令获取随机数据集:

In PHP I use the following command to get the random dataset:

 $mongoCursor = $mongoCollection->find($arrQuery, $arrFields)->skip(rand(1, $dataCount));

探查器现在会告诉:

 DB.Collection ntoskip:3224 nscanned:3326 nreturned:101 reslen:77979 262ms

这需要花费一些时间来获取结果. 有没有更好的方法来获取数据?

This takes quite some time to fetch the result. Is there a better way to get the data?

我考虑过要在PHP中获取所有ID,然后随机获取一个ID并找到该ID的完整集合.但是我担心要在php中获取这么多数据.

I thought about fetching all ids in PHP, then randomly take one id and find the complete set for this id. But I worry about fetching so many data in php.

感谢您对该主题的任何想法. 丹

Thanks for any thought on that topic. Dan

推荐答案

我尝试了多种解决随机问题的方法. 我使用了一个光标并将其移动到随机位置,但这非常慢. 然后,我使用了完整的数据集并选择了随机项目,虽然可以,但可能会更好.

Hi I tried multiple solutions to the random problem. I used a cursor and moved it to the random position, but this was extremly slow. Then I used the full dataset and picked random items, which was okay but could be better.

对我来说,性能最好的解决方案是选择随机数,取最小值和最大值,并使用以下方法查询数据库:

The best performing solution for me was to pick random numbers, take the min and max value and query the database using:

db.collection.find({...}).skip(min).limit(max-min);

然后,我仅遍历结果一次,并比较从i = min开始的索引; i ++;并只取与随机集合中的数字匹配的项目.对我而言,也可以随机限制最小和最大区域.我使用对数方法根据收藏的大小选择最小-最大窗口的大小.

Then I just iterated once throught the result and comparing an index starting with i = min; i++; and taking just the item which matched a number in the random set. For me it was okay to limit the area of min and max randomly, too. I used a logarithmic approach to choose the size of the min-max window according to my collection size.

结果是选择随机结果集的一种非常快速的方法.

Result is a really fast way to pick random resultsets.

希望这也可能对某人有所帮助.

Hope this might help somebody too.

---丹

这篇关于MongoDB查找随机数据集性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆