如何在mongo中通过查询有效地删除文档? [英] How to delete documents by query efficiently in mongo?

查看：72 发布时间：2020/5/10 22:11:47 mongodb

本文介绍了如何在mongo中通过查询有效地删除文档?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个查询，该查询选择要删除的文档.现在，我手动删除它们，就像这样(使用python):

I have a query, which selects documents to be removed. Right now, I remove them manually, like this (using python):

for id in mycoll.find(query, fields={}):
  mycoll.remove(id)

这似乎不是很有效.有更好的方法吗?

This does not seem to be very efficient. Is there a better way?

编辑

好的，我很抱歉忘记提及查询详细信息，因为这很重要.这是完整的python代码:

OK, I owe an apology for forgetting to mention the query details, because it matters. Here is the complete python code:

def reduce_duplicates(mydb, max_group_size):
  # 1. Count the group sizes
  res = mydb.static.map_reduce(jstrMeasureGroupMap, jstrMeasureGroupReduce, 'filter_scratch', full_response = True)
  # 2. For each entry from the filter scratch collection having count > max_group_size
  deleteFindArgs = {'fields': {}, 'sort': [('test_date', ASCENDING)]}
  for entry in mydb.filter_scratch.find({'value': {'$gt': max_group_size}}):
    key = entry['_id']
    group_size = int(entry['value'])
    # 2b. query the original collection by the entry key, order it by test_date ascending, limit to the group size minus max_group_size.
    for id in mydb.static.find(key, limit = group_size - max_group_size, **deleteFindArgs):
      mydb.static.remove(id)
  return res['counts']['input']

那么，它有什么作用?它将每个键值的重复键数最多减少到max_group_size，仅保留最新记录.它是这样的:

So, what does it do? It reduces the number of duplicate keys to at most max_group_size per key value, leaving only the newest records. It works like this:

将数据分为(key, count)对.
使用count > max_group_size
通过key查询数据，同时按时间戳(从最旧的到最早的)升序排序，并将结果限制在count - max_group_size个最旧的记录中
删除每条找到的记录.

MR the data to (key, count) pairs.
Iterate over all the pairs with count > max_group_size
Query the data by key, while sorting it ascending by the timestamp (the oldest first) and limiting the result to the count - max_group_size oldest records
Delete each and every found record.

如您所见，这完成了将重复项减少到最多N个最新记录的任务.所以，最后两个步骤是foreach-found-remove，这是我的问题的重要细节，它改变了一切，我必须对此做更具体的说明-抱歉.

As you can see, this accomplishes the task of reducing the duplicates to at most N newest records. So, the last two steps are foreach-found-remove and this is the important detail of my question, that changes everything and I had to be more specific about it - sorry.

现在，关于集合删除命令.它确实接受查询，但是我包括排序和限制.我可以删除吗?好吧，我已经尝试过:

Now, about the collection remove command. It does accept query, but mine include sorting and limiting. Can I do it with remove? Well, I have tried:

mydb.static.find(key, limit = group_size - max_group_size, sort=[('test_date', ASCENDING)])

此尝试失败了.此外，它似乎搞砸了mongo.观察:

This attempt fails miserably. Moreover, it seems to screw mongo.Observe:

C:\dev\poc\SDR>python FilterOoklaData.py
bad offset:0 accessing file: /data/db/ookla.0 - consider repairing database

不用说，foreach-found-remove方法有效并产生预期的结果.

Needless to say, that the foreach-found-remove approach works and yields the expected results.

现在，我希望我提供了足够的背景信息，并(希望)恢复了我失去的荣誉.

Now, I hope I have provided enough context and (hopefully) have restored my lost honour.

如何在mongo中通过查询有效地删除文档? [英] How to delete documents by query efficiently in mongo?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在mongo中通过查询有效地删除文档? [英] How to delete documents by query efficiently in mongo?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭