在 mongodb 中删除重复文档的最快方法 [英] Fastest way to remove duplicate documents in mongodb
问题描述
我在 mongodb 中有大约 170 万个文档(将来会超过 1000 万).其中一些代表我不想要的重复条目.文档的结构是这样的:
I have approximately 1.7M documents in mongodb (in future 10m+). Some of them represent duplicate entry which I do not want. Structure of document is something like this:
{
_id: 14124412,
nodes: [
12345,
54321
],
name: "Some beauty"
}
如果文档与具有相同名称的另一个文档具有至少一个相同的节点,则该文档是重复的.删除重复项的最快方法是什么?
Document is duplicate if it has at least one node same as another document with same name. What is the fastest way to remove duplicates?
推荐答案
假设您要从集合中永久删除包含重复 name
+ nodes
条目的文档,您可以使用 dropDups: true代码>
选项:
Assuming you want to permanently delete docs that contain a duplicate name
+ nodes
entry from the collection, you can add a unique
index with the dropDups: true
option:
db.test.ensureIndex({name: 1, nodes: 1}, {unique: true, dropDups: true})
正如文档所说,使用此操作要格外小心,因为它会从您的数据库中删除数据.首先备份您的数据库,以防它不完全符合您的预期.
As the docs say, use extreme caution with this as it will delete data from your database. Back up your database first in case it doesn't do exactly as you're expecting.
更新
此解决方案仅对 MongoDB 2.x 有效,因为 dropDups
选项在 3.0 中不再可用(docs).
This solution is only valid through MongoDB 2.x as the dropDups
option is no longer available in 3.0 (docs).
这篇关于在 mongodb 中删除重复文档的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!