在 mongodb 中删除重复文档的最快方法 [英] Fastest way to remove duplicate documents in mongodb

查看:33
本文介绍了在 mongodb 中删除重复文档的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 mongodb 中有大约 170 万个文档(将来会超过 1000 万).其中一些代表我不想要的重复条目.文档的结构是这样的:

I have approximately 1.7M documents in mongodb (in future 10m+). Some of them represent duplicate entry which I do not want. Structure of document is something like this:

{
    _id: 14124412,
    nodes: [
        12345,
        54321
        ],
    name: "Some beauty"
}

如果文档与具有相同名称的另一个文档具有至少一个相同的节点,则该文档是重复的.删除重复项的最快方法是什么?

Document is duplicate if it has at least one node same as another document with same name. What is the fastest way to remove duplicates?

推荐答案

假设您要从集合中永久删除包含重复 name + nodes 条目的文档,您可以使用 dropDups: true 选项:

Assuming you want to permanently delete docs that contain a duplicate name + nodes entry from the collection, you can add a unique index with the dropDups: true option:

db.test.ensureIndex({name: 1, nodes: 1}, {unique: true, dropDups: true}) 

正如文档所说,使用此操作要格外小心,因为它会从您的数据库中删除数据.首先备份您的数据库,以防它不完全符合您的预期.

As the docs say, use extreme caution with this as it will delete data from your database. Back up your database first in case it doesn't do exactly as you're expecting.

更新

此解决方案仅对 MongoDB 2.x 有效,因为 dropDups 选项在 3.0 中不再可用(docs).

This solution is only valid through MongoDB 2.x as the dropDups option is no longer available in 3.0 (docs).

这篇关于在 mongodb 中删除重复文档的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆