MongoDB:如何在100个集合中查找10个随机文档? [英] MongoDB: how to find 10 random document in a collection of 100?

查看:553
本文介绍了MongoDB:如何在100个集合中查找10个随机文档?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

MongoDB是否能够在不进行多次查询的情况下为大量随机文档提供资金?

Is MongoDB capable of funding number of random documents without making multiple queries?

例如在加载了集合中的所有文档之后,我在JS端实现了,这很浪费-因此只想检查是否可以通过一个db查询更好地完成此操作?

e.g. I implemented on the JS side after loading all the document in the collection, which is wasteful - hence just wanted to check if this can be done better with one db query?

我在JS方面采取的路径:

The path I took on the JS side:

  • 获取所有数据
  • 创建ID数组
  • 随机排列ID数组(随机顺序)
  • 将数组拼接为所需的文档数
  • 通过按ID进行选择来创建文档列表,该ID是我们在之前的两次操作(从整个集合中一个接一个地操作)后留下的

两个主要缺点是我正在加载所有数据-或进行多个查询.

Two major drawback are that I am loading all data - or I make multiple queries.

任何建议都值得赞赏

推荐答案

很久以前就回答了这个问题,从那时起,MongoDB有了长足的发展.

This was answered long time ago and, since then, MongoDB has greatly evolved.

如另一个答案中所述,MongoDB现在支持在Aggregation Framework中进行抽样从3.2版开始:

As posted in another answer, MongoDB now supports sampling within the Aggregation Framework since version 3.2:

您可以这样做:

db.products.aggregate([{$sample: {size: 5}}]); // You want to get 5 docs

或者:

db.products.aggregate([
  {$match: {category:"Electronic Devices"}}, // filter the results
  {$sample: {size: 5}} // You want to get 5 docs
]);

但是,有关于$的一些警告样本运算符:

However, there are some warnings about the $sample operator:

(截至2017年11月6日,最新版本为3.4)=>如果不满足以下条件之一:

(as of Nov, 6h 2017, where latest version is 3.4) => If any of this is not met:

  • $ sample是管道的第一阶段
  • N小于集合中文档总数的5%
  • 馆藏包含100多个文档

如果不满足以上任何条件,则$ sample将执行 收集扫描,然后随机排序以选择N个文档.

If any of the above conditions are NOT met, $sample performs a collection scan followed by a random sort to select N documents.

就像上一个示例中的$ match

Like in the last example with the $match

旧答案

您可以随时运行:

db.products.find({category:"Electronic Devices"}).skip(Math.random()*YOUR_COLLECTION_SIZE)

但是顺序不会是随机的,您将需要两个查询(一次计数即可获得YOUR_COLLECTION_SIZE)或估计它的大小(大约100条记录,大约1000条记录,大约10000条...)

But the order won't be random and you will need two queries (one count to get YOUR_COLLECTION_SIZE) or estimate how big it is (it is about 100 records, about 1000, about 10000...)

您还可以为所有文档添加一个带有随机数的字段,然后按该数字进行查询.这样做的缺点是,每次运行相同的查询时,您将获得相同的结果.要解决此问题,您可以始终使用极限,跳过甚至排序进行游戏.您也可以在每次获取记录时更新这些随机数(这意味着要进行更多查询).

You could also add a field to all documents with a random number and query by that number. The drawback here would be that you will get the same results every time you run the same query. To fix that you can always play with limit and skip or even with sort. you could as well update those random numbers every time you fetch a record (implies more queries).

-我不知道您是在使用Mongoose,Mondoid还是直接在Mongo Driver中使用任何特定语言,因此,我将全部撰写有关mongo shell的文章.

--I don't know if you are using Mongoose, Mondoid or directly the Mongo Driver for any specific language, so I'll write all about mongo shell.

因此,假设您的产品记录如下:

Thus your, let's say, product record would look like this:

{
 _id: ObjectId("..."),
 name: "Awesome Product",
 category: "Electronic Devices",
}

,我建议使用:

{
 _id: ObjectId("..."),
 name: "Awesome Product",
 category: "Electronic Devices",
 _random_sample: Math.random()
}

那么你可以做:

db.products.find({category:"Electronic Devices",_random_sample:{$gte:Math.random()}})

然后,您可以定期运行,以便定期更新文档的_random_sample字段:

then, you could run periodically so you update the document's _random_sample field periodically:

var your_query = {} //it would impact in your performance if there are a lot of records
your_query = {category: "Electronic Devices"} //Update 
//upsert = false, multi = true
db.products.update(your_query,{$set:{_random_sample::Math.random()}},false,true)

或者只要您检索一些记录,就可以更新所有记录或仅更新一些记录(取决于您检索的记录数)

or just whenever you retrieve some records you could update all of them or just a few (depending on how many records you've retrieved)

for(var i = 0; i < records.length; i++){
   var query = {_id: records[i]._id};
   //upsert = false, multi = false
   db.products.update(query,{$set:{_random_sample::Math.random()}},false,false);
}

编辑

请注意

db.products.update(your_query,{$set:{_random_sample::Math.random()}},false,true)

效果不佳,因为它将使用相同随机数更新与您的查询匹配的所有产品.最后一种方法效果更好(在检索某些文档时更新它们)

won't work very well as it will update every products that matches your query with the same random number. The last approach works better (updating some documents as you retrieve them)

这篇关于MongoDB:如何在100个集合中查找10个随机文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆