删除 mongodb 中的重复项未按预期运行 [英] Removing duplicated in mongodb not behaving as expected

查看:41
本文介绍了删除 mongodb 中的重复项未按预期运行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 mongodb 中有一个表,由于需要基于基础源数据进行数据解析更新,我意识到该表已重复.

I have a table in mongodb that I have realized has duplicated due to a required data parsing update based on underlying source data.

由于源代码的更改,代码出现了意外行为并插入了许多重复项.

Due to the change in the source, to code was having an unexpected behavior and inserting many duplicates.

以下查询应返回单个值:

The following query should return a single value:

db.opts.find({
  $query: {
    ticker: "VXX",
    date: 20150423,
    callPut: "P",
    Strike: 27,
    maturity: 20150424
  },
  $orderby: {
    maturity: 1
  }
})

然而,由于代码中的错误,不幸的是,我有多个条目用于此观察.观察结果之一如下所示:

Yet due to the bug in the code, I have multiple entries for this observation unfortunately. One of the observations looks like this:

{
  "_id": ObjectId("55396c1c44fea47bde858c78"),
  "date": 20150423,
  "ticker": "VXX",
  "callPut": "P",
  "Last": 6.1,
  "Vol": 25,
  "Chg": 0.43,
  "maturity": 20150424,
  "Symbol": "VXX150424P00027000",
  "Open Int": 809,
  "Strike": 27,
  "Ask": 6.1,
  "Bid": 5.85
}

现在,我的目标是删除共享某些字段的重复项.

Now, my goal is to remove the duplicates that share certain fields.

我尝试运行以下命令:

db.opts.ensureIndex({
  date: 1,
  ticker: 1,
  callPut: 1,
  maturity: 1,
  Symbol: 1,
  Strike: 1
}, {
  unique: true,
  dropDups: true
})

然而,重复项并没有下降.

Yet the duplicates did not drop.

我还尝试过:

db.opts.createIndex({
  date: 1,
  ticker: 1,
  callPut: 1,
  maturity: 1,
  Symbol: 1,
  Strike: 1
}, {
  unique: true,
  dropDups: true
})

除了上面记录的内容之外,我没有在这些字段上定义索引.该集合非常简单,因为我创建它并每天插入数据.还没有别的.

I do not have indices defined on these fields outside of what is documented above. The collection is quite vanilla as I created it and insert data daily. Nothing else yet.

重复数据如下所示:

> db.opts.find({$query:{ticker:"VXX",date:20150423,callPut:"P",Strike:27}})
{ "_id" : ObjectId("55396c1c44fea47bde858c78"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c1c44fea47bde858cd1"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c1c44fea47bde858d2a"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c1d44fea47bde858d83"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c1d44fea47bde858ddc"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c1d44fea47bde858e35"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c1e44fea47bde858e8e"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c1e44fea47bde858ee7"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c1e44fea47bde858f40"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c1f44fea47bde858f99"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c1f44fea47bde858ff2"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c2044fea47bde85904b"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c2044fea47bde8590a4"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c2044fea47bde8590fd"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c2144fea47bde859156"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c2144fea47bde8591af"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c2244fea47bde859208"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c2244fea47bde859261"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c2244fea47bde8592ba"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
{ "_id" : ObjectId("55396c2344fea47bde859313"), "date" : 20150423, "ticker" : "V
XX", "callPut" : "P", "Last" : 6.1, "Vol" : 25, "Chg" : 0.43, "maturity" : 20150
424, "Symbol" : "VXX150424P00027000", "Open Int" : 809, "Strike" : 27, "Ask" : 6
.1, "Bid" : 5.85 }
Type "it" for more
>

如何删除这些重复项?

推荐答案

dropDups 选项在 3.0 中不再可用,但您可以在 shell 中使用一个迭代的小脚本轻松完成此操作在整个集合中删除重复具有相同键值的文档:

The dropDups option is no longer available in 3.0, but you can do this pretty easily in the shell with a little script that iterates over the whole collection and removes docs that duplicate ones that have the same key values:

var keys = {};
db.opts.find().forEach(function(doc) {
    // Create a key that concatenates the keys that together must be unique.
    var key = ''.concat(doc.date, doc.ticker, doc.callPut, doc.maturity,
                        doc.Symbol, doc.Strike);
    if (keys[key]) {
        // A doc with this key has already been seen, so remove this doc.
        db.opts.remove({_id: doc._id});
    } else {
        keys[key] = true;
    }
});

显然,在执行此操作之前进行备份,以防它无法完全按照您的预期工作.

Obviously, make a backup before doing this in case it doesn't work exactly as you're expecting.

这篇关于删除 mongodb 中的重复项未按预期运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆