如何在mongo中通过查询有效地删除文档? [英] How to delete documents by query efficiently in mongo?

查看:72
本文介绍了如何在mongo中通过查询有效地删除文档?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个查询,该查询选择要删除的文档.现在,我手动删除它们,就像这样(使用python):

I have a query, which selects documents to be removed. Right now, I remove them manually, like this (using python):

for id in mycoll.find(query, fields={}):
  mycoll.remove(id)

这似乎不是很有效.有更好的方法吗?

This does not seem to be very efficient. Is there a better way?

编辑

好的,我很抱歉忘记提及查询详细信息,因为这很重要.这是完整的python代码:

OK, I owe an apology for forgetting to mention the query details, because it matters. Here is the complete python code:

def reduce_duplicates(mydb, max_group_size):
  # 1. Count the group sizes
  res = mydb.static.map_reduce(jstrMeasureGroupMap, jstrMeasureGroupReduce, 'filter_scratch', full_response = True)
  # 2. For each entry from the filter scratch collection having count > max_group_size
  deleteFindArgs = {'fields': {}, 'sort': [('test_date', ASCENDING)]}
  for entry in mydb.filter_scratch.find({'value': {'$gt': max_group_size}}):
    key = entry['_id']
    group_size = int(entry['value'])
    # 2b. query the original collection by the entry key, order it by test_date ascending, limit to the group size minus max_group_size.
    for id in mydb.static.find(key, limit = group_size - max_group_size, **deleteFindArgs):
      mydb.static.remove(id)
  return res['counts']['input']

那么,它有什么作用?它将每个键值的重复键数最多减少到max_group_size仅保留最新记录.它是这样的:

So, what does it do? It reduces the number of duplicate keys to at most max_group_size per key value, leaving only the newest records. It works like this:

  1. 将数据分为(key, count)对.
  2. 使用count > max_group_size
  3. 遍历所有对.
  4. 通过key查询数据,同时按时间戳(从最旧的到最早的)升序排序,并将结果限制在count - max_group_size个最旧的记录中
  5. 删除每条找到的记录.
  1. MR the data to (key, count) pairs.
  2. Iterate over all the pairs with count > max_group_size
  3. Query the data by key, while sorting it ascending by the timestamp (the oldest first) and limiting the result to the count - max_group_size oldest records
  4. Delete each and every found record.

如您所见,这完成了将重复项减少到最多N个最新记录的任务.所以,最后两个步骤是foreach-found-remove,这是我的问题的重要细节,它改变了一切,我必须对此做更具体的说明-抱歉.

As you can see, this accomplishes the task of reducing the duplicates to at most N newest records. So, the last two steps are foreach-found-remove and this is the important detail of my question, that changes everything and I had to be more specific about it - sorry.

现在,关于集合删除命令.它确实接受查询,但是我包括排序和限制.我可以删除吗?好吧,我已经尝试过:

Now, about the collection remove command. It does accept query, but mine include sorting and limiting. Can I do it with remove? Well, I have tried:

mydb.static.find(key, limit = group_size - max_group_size, sort=[('test_date', ASCENDING)])

此尝试失败了.此外,它似乎搞砸了mongo.观察:

This attempt fails miserably. Moreover, it seems to screw mongo.Observe:

C:\dev\poc\SDR>python FilterOoklaData.py
bad offset:0 accessing file: /data/db/ookla.0 - consider repairing database

不用说,foreach-found-remove方法有效并产生预期的结果.

Needless to say, that the foreach-found-remove approach works and yields the expected results.

现在,我希望我提供了足够的背景信息,并(希望)恢复了我失去的荣誉.

Now, I hope I have provided enough context and (hopefully) have restored my lost honour.

推荐答案

您可以使用查询删除所有匹配的文档

You can use a query to remove all matching documents

var query = {name: 'John'};
db.collection.remove(query);

请谨慎,但是,如果匹配的文档数量很多,则数据库的响应速度可能会降低.通常建议删除较小的文档块.

Be wary, though, if number of matching documents is high, your database might get less responsive. It is often advised to delete documents in smaller chunks.

比方说,您有10万个文档要从集合中删除.最好执行100个删除1k文档的查询,而不是执行1个删除所有100k文档的查询.

Let's say, you have 100k documents to delete from a collection. It is better to execute 100 queries that delete 1k documents each than 1 query that deletes all 100k documents.

这篇关于如何在mongo中通过查询有效地删除文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆