如何在MongoDB中删除列表内的重复值 [英] How to remove duplicate values inside a list in mongodb

查看:78
本文介绍了如何在MongoDB中删除列表内的重复值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个mongodb集合.当我这样做的时候.

I have a mongodb collection . When I do.

db.bill.find({})

我明白了

{ 
    "_id" : ObjectId("55695ea145e8a960bef8b87a"),
    "name" : "ABC. Net", 
    "code" : "1-98tfv",
    "abbreviation" : "ABC",
    "bill_codes" : [  190215,  44124,  190215,  147708 ],
    "customer_name" : "abc"
}

我需要执行一项操作,以从bill_codes中删除重复的值.最后应该是

I need an operation to remove the duplicate values from the bill_codes. Finally it should be

{ 
    "_id" : ObjectId("55695ea145e8a960bef8b87a"),
    "name" : "ABC. Net", 
    "code" : "1-98tfv",
    "abbreviation" : "ABC",
    "bill_codes" : [  190215,  44124,  147708 ],
    "customer_name" : "abc"
}

如何在mongodb中实现这一目标.

How to achieve this in mongodb.

推荐答案

好吧,您可以使用聚合框架执行以下操作:

Well's you can do this using the aggregation framework as follows:

collection.aggregate([
    { "$project": {
        "name": 1,
        "code": 1,
        "abbreviation": 1,
        "bill_codes": { "$setUnion": [ "$bill_codes", [] ] }
    }}
])

$setUnion 运算符是一个集合操作人员,因此进行设置",然后仅保留唯一"项目.

The $setUnion operator is a "set" operator, therefore to make a "set" then only the "unique" items are kept behind.

如果您仍在使用2.6之前的MongoDB版本,则必须使用 $unwind $addToSet 来执行此操作:

If you are still using a MongoDB version older than 2.6 then you would have to do this operation with $unwind and $addToSet instead:

collection.aggregate([
    { "$unwind": "$bill_codes" },
    { "$group": {
        "_id": "$_id",
        "name": { "$first": "$name" },
        "code": { "$first": "$code" },
        "abbreviation": { "$first": "$abbreviation" },
        "bill_codes": { "$addToSet": "$bill_codes" }
    }}
])

效率不高,但是从2.2版开始支持运算符.

It's not as efficient but the operators are supported since version 2.2.

当然,如果您实际上要永久修改收集文档,则可以对此进行扩展并相应地处理每个文档的更新.您可以从.aggregate()检索游标",但基本上遵循以下shell示例:

Of course if you actually want to modify your collection documents permanently then you can expand on this and process the updates for each document accordingly. You can retrieve a "cursor" from .aggregate(), but basically following this shell example:

db.collection.aggregate([
    { "$project": {
        "bill_codes": { "$setUnion": [ "$bill_codes", [] ] },
        "same": { "$eq": [
            { "$size": "$bill_codes" },
            { "$size": { "$setUnion": [ "$bill_codes", [] ] } }
        ]}
    }},
    { "$match": { "same": false } }
]).forEach(function(doc) {
    db.collection.update(
        { "_id": doc._id },
        { "$set": { "bill_codes": doc.bill_codes } }
    )
})

早期版本涉及更多:

db.collection.aggregate([
    { "$unwind": "$bill_codes" },
    { "$group": {
        "_id": { 
            "_id": "$_id",
            "bill_code": "$bill_codes"
        },
        "origSize": { "$sum": 1 }
    }},
    { "$group": {
        "_id": "$_id._id",
        "bill_codes": { "$push": "$_id.bill_code" },
        "origSize": { "$sum": "$origSize" },
        "newSize": { "$sum": 1 }
    }},
    { "$project": {
        "bill_codes": 1,
        "same": { "$eq": [ "$origSize", "$newSize" ] }
    }},
    { "$match": { "same": false } }
]).forEach(function(doc) {
    db.collection.update(
        { "_id": doc._id },
        { "$set": { "bill_codes": doc.bill_codes } }
    )
})

使用添加的操作比较重复数据删除"数组是否与原始数组长度相同,并且仅返回那些删除了重复项"的文档以进行更新处理.

With the added operations in there to compare if the "de-duplicated" array is the same as the original array length, and only return those documents that had "duplicates" removed for processing on updates.

可能也应该在此处添加"for python"注释.如果您不关心识别"包含重复数组条目的文档,并准备用更新爆炸"整个集合,则只需使用python

Probably should add the "for python" note here as well. If you don't care about "identifying" the documents that contain duplicate array entries and are prepared to "blast" the whole collection with updates, then just use python .set() in the client code to remove the duplicates:

for doc in collection.find():
    collection.update(
       { "_id": doc["_id"] },
       { "$set": { "bill_codes": list(set(doc["bill_codes"])) } }
    )

这很简单,取决于哪个是更大的弊端,查找具有重复项的文档或更新每个文档(无论是否需要)的成本.

So that's quite simple and it depends on which is the greater evil, the cost of finding the documents with duplicates or updating every document whether it needs it or not.

这至少涵盖了技术.

这篇关于如何在MongoDB中删除列表内的重复值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆