如何在MongoDb中删除重复项? [英] How can I delete duplicates in MongoDb?
问题描述
ensureIndex({id:1},{unique:true,dropDups:true})
。 Mongo在这一段时间内消失了一段时间,之后才决定使用dropDups = true 创建索引的太多的重复。
如何添加索引并摆脱重复项?或者另一种方法,删除一些重复的最好方法是什么,以便mongo可以成功构建索引?
对于奖励积分,为什么数字有限制可以删除的副本?
解决方案
对于奖励积分,为什么有一个限制可以删除的dup的数量?
MongoDB很有可能为自己辩护。如果您在错误的字段上 dropDups
,您可以软管整个数据集,并通过删除操作锁定数据库(与写入一样昂贵)。
如何添加索引并摆脱重复项?
所以第一个问题是为什么在 id
字段创建一个唯一的索引?
MongoDB创建自动唯一的和索引的默认 _id
字段。默认情况下,MongoDB使用 ObjectId
填充 _id
,但是您可以用任何您喜欢的值覆盖该值。 所以如果您有一套现成的ID值,可以使用这些。
如果无法重新导入值,则将其复制将 id
更改为 _id
中的新集合。然后,您可以删除旧集合并重命名新集合。 (请注意,您将收到一堆重复键错误,确保您的代码捕获并忽略它们)
I have a large collection (~2.7 million documents) in mongodb, and there are a lot of duplicates. I tried running ensureIndex({id:1}, {unique:true, dropDups:true})
on the collection. Mongo churns away at it for a while before it decides that too many dups on index build with dropDups=true
.
How can I add the index and get rid of the duplicates? Or the other way around, what's the best way to delete some dups so that mongo can successfully build the index?
For bonus points, why is there a limit to the number of dups that can be dropped?
解决方案
For bonus points, why is there a limit to the number of dups that can be dropped?
MongoDB is likely doing this to defend itself. If you dropDups
on the wrong field, you could hose the entire dataset and lock down the DB with delete operations (which are "as expensive" as writes).
How can I add the index and get rid of the duplicates?
So the first question is why are you creating a unique index on the id
field?
MongoDB creates a default _id
field that is automatically unique and indexed. By default MongoDB populates the _id
with an ObjectId
, however, you can override this with whatever value you like. So if you have a ready set of ID values, you can use those.
If you cannot re-import the values, then copy them to a new collection while changing id
into _id
. You can then drop the old collection and rename the new one. (note that you will get a bunch of "duplicate key errors", ensure that your code catches and ignores them)
这篇关于如何在MongoDb中删除重复项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!