如何删除mongodb列表中的重复值 [英] How to remove duplicate values inside a list in mongodb

查看:23
本文介绍了如何删除mongodb列表中的重复值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 mongodb 集合.当我这样做的时候.

I have a mongodb collection . When I do.

db.bill.find({})

我明白了,

{ 
    "_id" : ObjectId("55695ea145e8a960bef8b87a"),
    "name" : "ABC. Net", 
    "code" : "1-98tfv",
    "abbreviation" : "ABC",
    "bill_codes" : [  190215,  44124,  190215,  147708 ],
    "customer_name" : "abc"
}

我需要一个操作来从 bill_codes 中删除重复值.最后应该是

I need an operation to remove the duplicate values from the bill_codes. Finally it should be

{ 
    "_id" : ObjectId("55695ea145e8a960bef8b87a"),
    "name" : "ABC. Net", 
    "code" : "1-98tfv",
    "abbreviation" : "ABC",
    "bill_codes" : [  190215,  44124,  147708 ],
    "customer_name" : "abc"
}

如何在 mongodb 中实现.

How to achieve this in mongodb.

推荐答案

您可以使用聚合框架执行此操作,如下所示:

Well's you can do this using the aggregation framework as follows:

collection.aggregate([
    { "$project": {
        "name": 1,
        "code": 1,
        "abbreviation": 1,
        "bill_codes": { "$setUnion": [ "$bill_codes", [] ] }
    }}
])

$setUnion 运算符是集合"运算符,因此要进行集合",则只保留唯一"项.

The $setUnion operator is a "set" operator, therefore to make a "set" then only the "unique" items are kept behind.

如果您仍在使用早于 2.6 的 MongoDB 版本,则必须使用 $unwind$addToSet 改为:

If you are still using a MongoDB version older than 2.6 then you would have to do this operation with $unwind and $addToSet instead:

collection.aggregate([
    { "$unwind": "$bill_codes" },
    { "$group": {
        "_id": "$_id",
        "name": { "$first": "$name" },
        "code": { "$first": "$code" },
        "abbreviation": { "$first": "$abbreviation" },
        "bill_codes": { "$addToSet": "$bill_codes" }
    }}
])

效率不高,但从 2.2 版本开始支持运算符.

It's not as efficient but the operators are supported since version 2.2.

当然,如果您真的想永久修改您的收藏文档,那么您可以对此进行扩展并相应地处理每个文档的更新.您可以从 .aggregate() 中检索光标",但基本上遵循以下 shell 示例:

Of course if you actually want to modify your collection documents permanently then you can expand on this and process the updates for each document accordingly. You can retrieve a "cursor" from .aggregate(), but basically following this shell example:

db.collection.aggregate([
    { "$project": {
        "bill_codes": { "$setUnion": [ "$bill_codes", [] ] },
        "same": { "$eq": [
            { "$size": "$bill_codes" },
            { "$size": { "$setUnion": [ "$bill_codes", [] ] } }
        ]}
    }},
    { "$match": { "same": false } }
]).forEach(function(doc) {
    db.collection.update(
        { "_id": doc._id },
        { "$set": { "bill_codes": doc.bill_codes } }
    )
})

早期版本涉及更多:

db.collection.aggregate([
    { "$unwind": "$bill_codes" },
    { "$group": {
        "_id": { 
            "_id": "$_id",
            "bill_code": "$bill_codes"
        },
        "origSize": { "$sum": 1 }
    }},
    { "$group": {
        "_id": "$_id._id",
        "bill_codes": { "$push": "$_id.bill_code" },
        "origSize": { "$sum": "$origSize" },
        "newSize": { "$sum": 1 }
    }},
    { "$project": {
        "bill_codes": 1,
        "same": { "$eq": [ "$origSize", "$newSize" ] }
    }},
    { "$match": { "same": false } }
]).forEach(function(doc) {
    db.collection.update(
        { "_id": doc._id },
        { "$set": { "bill_codes": doc.bill_codes } }
    )
})

其中添加了操作来比较去重"数组是否与原始数组长度相同,并且仅返回那些已删除重复"以进行更新处理的文档.

With the added operations in there to compare if the "de-duplicated" array is the same as the original array length, and only return those documents that had "duplicates" removed for processing on updates.

可能也应该在此处添加for python"注释.如果您不关心识别"包含重复数组条目的文档并准备通过更新爆炸"整个集合,那么只需使用 python .set() 在客户端代码中删除重复项:

Probably should add the "for python" note here as well. If you don't care about "identifying" the documents that contain duplicate array entries and are prepared to "blast" the whole collection with updates, then just use python .set() in the client code to remove the duplicates:

for doc in collection.find():
    collection.update(
       { "_id": doc["_id"] },
       { "$set": { "bill_codes": list(set(doc["bill_codes"])) } }
    )

所以这很简单,它取决于哪个更大,查找具有重复的文档或更新每个文档是否需要它的成本.

So that's quite simple and it depends on which is the greater evil, the cost of finding the documents with duplicates or updating every document whether it needs it or not.

这至少涵盖了技术.

这篇关于如何删除mongodb列表中的重复值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆