使用特定排序消除 MongoDB 中的重复项 [英] Eliminate duplicates in MongoDB with a specific sort

查看：57 发布时间：2021/6/3 20:29:20 mongodb drop-duplicates

本文介绍了使用特定排序消除 MongoDB 中的重复项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个由与工作合同对应的条目组成的数据库.在我由特定工作人员聚合的 MongoDB 数据库中，数据库 - 在简化版本中 - 看起来像这样.

I have a database composed by entries which correspond to work contracts. In the MongoDB database I have aggregated by specific worker, then the database - in a simplified version - looks like something like that.

{
    "_id" : ObjectId("5ea995662a40c63b14266071"),
    "worker" : "1070",
    "employer" : "2116096",
    "start" : ISODate("2018-01-11T01:00:00.000+01:00"),
    "ord_id" : 0
},
{
    "_id" : ObjectId("5ea995662a40c63b14266071"),
    "worker" : "1070",
    "employer" : "2116096",
    "start" : ISODate("2018-01-11T01:00:00.000+01:00"),
    "ord_id" : 1
},
{
    "_id" : ObjectId("5ea995662a40c63b14266072"),
    "worker" : "1071",
    "employer" : "2116055",
    "start" : ISODate("2019-01-03T01:00:00.000+01:00"),
    "ord_id" : 2
},
{
    "_id" : ObjectId("5ea995662a40c63b14266072"),
    "worker" : "1071",
    "employer" : "2116056",
    "start" : ISODate("2019-01-03T01:00:00.000+01:00"),
    "ord_id" : 3
},

我根据工人重新安排了

{
    "_id" : ObjectId("5ea995662a40c63b14266071"),
    "worker" : "1070",
    "contratcs" : [
             {
               "employer" : "2116096",
               "start" : ISODate("2018-01-11T01:00:00.000+01:00"),
               "ord_id" : 0
             },
             {
               "employer" : "2116096",  
               "start" : ISODate("2018-01-11T01:00:00.000+01:00"),
               "ord_id" : 1
             } // Since employer identification and starting date is the same of the previous, this is a duplicate!
         ]
},
{
    "_id" : ObjectId("5ea995662a40c63b14266072"),
    "worker" : "1701",
    "contratcs" : [
             {
               "employer" : "2116055",
               "start" : ISODate("2019-01-03T01:00:00.000+01:00"),
               "ord_id" : 2
             },
             {
               "employer" : "2116056",
               "start" : ISODate("2019-01-04T01:00:00.000+01:00"),
               "ord_id" : 3
             }
         ]
}

从原始表中，有些合同已被重复检查，因此我只需要保留一份.更具体地说(在示例中)，我认为这些合同(针对同一工人)在同一天开始并与同一雇主重复.但是，应该正确选择哪些重复保留哪些不保留(这不取决于我).实质上，有一个名为ord_id"的字段(我已将数据库生成到 MongoDB 中)，它是一个数字并且是唯一的(因此，在重复项中，它是唯一实际不同的术语).基本上，我必须在重复项中保留那些具有最高值的ord_id".通过关注我写的这个线程:

From the original table some contracts has been doubled checked, hence I have to preserve only one. More specifically (in the example), I consider duplicates those contracts (for the same worker) started on the same day and with the same employer. However, there should be a proper choice of which duplicate preserve and which not (it does not depend on me). Substantially, there is a field named 'ord_id' (I have generated generating the database into MongoDB) which is a number and is unique (hence, among duplicates, it is the only term that actually differs). Substantially, I have to preserve, among duplicates, those with the highest valued of 'ord_id'. By following this thread I wrote:

db.mycollection.aggregate([
    { $unwind: "$contracts" },
    { $group: {
        _id: { WORKER: "$worker", START: "$contracts.start" },
        dups: { $addToSet: "$_id" },
        ord_id: { $addToSet: "$contracts.ord_id" },
        count:  {$sum: 1 }
        }
    },
    { $match: { count: { $gt: 1} } },
    { $sort: {count: -1, ord_id: -1 } }
],{allowDiskUse: true}).
forEach(function(doc) {
    doc.dups.shift();     
    db.mycollection.remove({_id : {$in: doc.dups }});  
});

尽管我在按合同聚合时面临消除问题，但我想转移(然后保留)具有最高ord_id"值的重复项.我还是 MongoDB 的新手，仍然处于从主要关系 (SQL) 方法转变的心理阶段.为愚蠢的问题道歉.

Despite the fact that I am facing problems in eliminating when I aggregate by contracts, I would like to shift (then preserve) of the duplicates the one with the highest value of 'ord_id'. I am still new in MongoDB and still in a phase of mental switching from a mostly relational (SQL) approach. Apologize for the silly question.

使用特定排序消除 MongoDB 中的重复项 [英] Eliminate duplicates in MongoDB with a specific sort

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用特定排序消除 MongoDB 中的重复项 [英] Eliminate duplicates in MongoDB with a specific sort

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭