如何在mongo中通过查询有效地删除文档? [英] How to delete documents by query efficiently in mongo?
问题描述
我有一个查询,该查询选择要删除的文档.现在,我手动删除它们,就像这样(使用python):
I have a query, which selects documents to be removed. Right now, I remove them manually, like this (using python):
for id in mycoll.find(query, fields={}):
mycoll.remove(id)
这似乎不是很有效.有更好的方法吗?
This does not seem to be very efficient. Is there a better way?
编辑
好的,我很抱歉忘记提及查询详细信息,因为这很重要.这是完整的python代码:
OK, I owe an apology for forgetting to mention the query details, because it matters. Here is the complete python code:
def reduce_duplicates(mydb, max_group_size):
# 1. Count the group sizes
res = mydb.static.map_reduce(jstrMeasureGroupMap, jstrMeasureGroupReduce, 'filter_scratch', full_response = True)
# 2. For each entry from the filter scratch collection having count > max_group_size
deleteFindArgs = {'fields': {}, 'sort': [('test_date', ASCENDING)]}
for entry in mydb.filter_scratch.find({'value': {'$gt': max_group_size}}):
key = entry['_id']
group_size = int(entry['value'])
# 2b. query the original collection by the entry key, order it by test_date ascending, limit to the group size minus max_group_size.
for id in mydb.static.find(key, limit = group_size - max_group_size, **deleteFindArgs):
mydb.static.remove(id)
return res['counts']['input']
那么,它有什么作用?它将每个键值的重复键数最多减少到max_group_size
,仅保留最新记录.它是这样的:
So, what does it do? It reduces the number of duplicate keys to at most max_group_size
per key value, leaving only the newest records. It works like this:
- 将数据分为
(key, count)
对. - 使用
count > max_group_size
遍历所有对.
- 通过
key
查询数据,同时按时间戳(从最旧的到最早的)升序排序,并将结果限制在count - max_group_size
个最旧的记录中 - 删除每条找到的记录.
- MR the data to
(key, count)
pairs. - Iterate over all the pairs with
count > max_group_size
- Query the data by
key
, while sorting it ascending by the timestamp (the oldest first) and limiting the result to thecount - max_group_size
oldest records - Delete each and every found record.
如您所见,这完成了将重复项减少到最多N个最新记录的任务.所以,最后两个步骤是foreach-found-remove
,这是我的问题的重要细节,它改变了一切,我必须对此做更具体的说明-抱歉.
As you can see, this accomplishes the task of reducing the duplicates to at most N newest records. So, the last two steps are foreach-found-remove
and this is the important detail of my question, that changes everything and I had to be more specific about it - sorry.
现在,关于集合删除命令.它确实接受查询,但是我包括排序和限制.我可以删除吗?好吧,我已经尝试过:
Now, about the collection remove command. It does accept query, but mine include sorting and limiting. Can I do it with remove? Well, I have tried:
mydb.static.find(key, limit = group_size - max_group_size, sort=[('test_date', ASCENDING)])
此尝试失败了.此外,它似乎搞砸了mongo.观察:
This attempt fails miserably. Moreover, it seems to screw mongo.Observe:
C:\dev\poc\SDR>python FilterOoklaData.py
bad offset:0 accessing file: /data/db/ookla.0 - consider repairing database
不用说,foreach-found-remove方法有效并产生预期的结果.
Needless to say, that the foreach-found-remove approach works and yields the expected results.
现在,我希望我提供了足够的背景信息,并(希望)恢复了我失去的荣誉.
Now, I hope I have provided enough context and (hopefully) have restored my lost honour.
推荐答案
您可以使用查询删除所有匹配的文档
You can use a query to remove all matching documents
var query = {name: 'John'};
db.collection.remove(query);
请谨慎,但是,如果匹配的文档数量很多,则数据库的响应速度可能会降低.通常建议删除较小的文档块.
Be wary, though, if number of matching documents is high, your database might get less responsive. It is often advised to delete documents in smaller chunks.
比方说,您有10万个文档要从集合中删除.最好执行100个删除1k文档的查询,而不是执行1个删除所有100k文档的查询.
Let's say, you have 100k documents to delete from a collection. It is better to execute 100 queries that delete 1k documents each than 1 query that deletes all 100k documents.
这篇关于如何在mongo中通过查询有效地删除文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!