_id(在mongo)中的重复文件 [英] Duplicate documents on _id (in mongo)

查看:135
本文介绍了_id(在mongo)中的重复文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个分层的mongo系列,超过一百五十万个文件。我使用_id列作为分片键,该列中的值是整数(而不是ObjectIds)。



我对这个集合做了很多写操作,使用Perl驱动程序(插入,更新,删除,保存)和mongoimport。



我的问题是,在某种程度上,我在同一个_id上有重复的文档。



我删除了重复的内容,但其他的仍然出现。



你有什么想法可以来自哪里,或者我应该开始看什么?
(另外,我试图复制一个较小的测试集合,但不会插入任何重复项,无论执行什么写操作)。

解决方案

这实际上并不是Perl驱动程序的问题,它与分片的特性有关。 MongoDB只能在创建时在单个分片上的文档中强制执行唯一性,因此默认索引不需要唯一性。



MongoDB:配置Sharding 文档特别提到:




  • 当您分割集合时,必须指定分片键。如果集合中有数据,mongo将需要先创建索引(加快分块过程);


  • 您可以使用{unique:true}选项确保基础索引强制执行唯一性,只要唯一索引是分片键的前缀。


  • 如果未使用unique:true选项,则分片键不具有是唯一的。



I have a sharded mongo collection, with over 1.5 mil documents. I use the _id column as a shard key, and the values in this column are integers (rather than ObjectIds).

I do a lot of write operations on this collection, using the Perl driver (insert, update, remove, save) and mongoimport.

My problem is that somehow, I have duplicate documents on the same _id. From what I've read, this shouldn't be possible.

I've removed the duplicates, but others still appear.

Do you have any ideas where could they come from, or what should I start looking at? (Also, I've tried to replicate this on a smaller, test collection, but no duplicates are inserted, no matter what write operation I perform).

解决方案

This actually isn't a problem with the Perl driver .. it is related to the characteristics of sharding. MongoDB is only able to enforce uniqueness among the documents located on a single shard at the time of creation, so the default index does not require uniqueness.

In the MongoDB: Configuring Sharding documentation there is specific mention that:

  • When you shard a collection, you must specify the shard key. If there is data in the collection, mongo will require an index to be created upfront (it speeds up the chunking process); otherwise, an index will be automatically created for you.

  • You can use the {unique: true} option to ensure that the underlying index enforces uniqueness so long as the unique index is a prefix of the shard key.

  • If the "unique: true" option is not used, the shard key does not have to be unique.

这篇关于_id(在mongo)中的重复文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆