mongodb objectid的一部分很可能是唯一的 [英] mongodb part of objectid most likely to be unique

查看:245
本文介绍了mongodb objectid的一部分很可能是唯一的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的应用中,我让mongo通过其ObjectId方法生成订单ID。

In my app I'm letting mongo generate order id's via its ObjectId method.

但是在用户测试中,我们担心订单ID会对人造成恐吓,即如果您需要通过电话与某人讨论您的订单,请阅读

But in user testing we've had some concerns that the order id's are humanly 'intimidating', i.e. if you need to discuss your order with someone over the telephone, reading out 24 alphanumeric characters is a bit tedious.

同时,我真的不想存储两个不同的ID,一个是人类可访问的,另一个是mongo内部使用的一种。

At the same time, I don't really want to have to store two different id's, one 'human-accessible' and one used by mongo internally.

所以我的问题是-有没有办法选择我可以肯定地确定其mongo objectId字符串的长度为6甚至8的子字符串?

So my question is this - is there a way to choose a substring of length 6 or even 8 of the mongo objectId string that I could be fairly sure would be unique ?

例如,如果我有一个像这样的mongo objectid

For example if I have a mongo objectid like this

id = '4b28dcb61083ed3c809e0416'

也许我可以拿出

human_id = id.substr(0,7);

,并确保我总是为我的订单获得唯一的ID ...

and be sure that i'd always get unique id's for my orders...

当然,优点是这些是订单,也是人工创建的,因此每毫秒没有几百万个订单。另一方面,如果两个订单具有相同的缩短的id,那将真的是一个问题。

The advantage of course is that these are orders, and so are human-created, and so there aren't millions of them per millisecond. On the other hand, it would really be a problem if two orders had the same shortened id...

---更清晰的解释---

--- clearer explanation ---

我想问一个更好的方法是:

I guess a better way to ask my question would be this :

例如,如果我决定只使用最后6个字符mongo id的形式,是否有某种概率量度,仅这6个字符会在给定的一周内重复出现?

If I decide for example to just use the last 6 characters of a mongo id, is there some kind of measure of 'probability' that just these 6 characters would repeat in a given week ?

给定一定数量的mongo并行运行,一周中一定数量的用户,等等。

Given a certain number of mongo's running in parallel, a certain number of users during the week, etc.

推荐答案

如果您有多个Web服务器,并且具有多个进程,那么确实没有什么可以删除而失去唯一性。

If you have multiple web servers, with multiple processes, then there really isn't something you can remove with losing uniqueness.

如果您查看 ObjectId 的性质:


  • 一个4字节的值,表示自Unix纪元以来的秒数,

  • 一个3字节的机器标识符,

  • 2字节的进程ID,

  • 3字节的计数器,从随机值开始。

  • a 4-byte value representing the seconds since the Unix epoch,
  • a 3-byte machine identifier,
  • a 2-byte process id, and
  • a 3-byte counter, starting with a random value.

您会看到那里没有什么可以安全删除的。由于前4个字节为时间,因此实施一种以干净安全的方式删除时间戳部分的算法将具有挑战性。

You'll see there's not much there that you could safely remove. As the first 4 bytes are time, it would be challenging to implement an algorithm that removed portions of the time stamp in a clean and safe way.

在有多个服务器和/或充当数据库服务器客户端的进程的情况下,使用机器标识符和进程标识符。如果您删除了其中任何一个,则可能会再次导致重复。最后一个3个字节的随机值用于确保即使在频繁请求的情况下,同一机器上同一进程中同一机器上的两个标识符也是唯一的。

The machine identifier and process identifier are used in cases where there are multiple servers and/or processes acting as clients to the database server. If you dropped either of those, you could end up with duplicates again. The random value as the last 3 bytes is used to make sure that two identifiers, on the same machine, within the same process are unique, even when requested frequently.

如果您将其用作订单 id ,并且想要确保唯一性,我不会从12字节数字中删减任何内容,因为它经过精心设计以提供强大的功能和当有许多连接的数据库客户端时,用于生成唯一编号的高效分布式机制。

If you were using it as an order id, and you want assured uniqueness, I wouldn't trim anything away from the 12 byte number as it was carefully designed to provide a robust and efficient distributed mechanism for generating unique numbers when there are many connected database clients.

如果您采用了ObjectId的后5个字符,并且在给定的时间内发生冲突的可能性是多少?

If you took the last 5 characters of the ObjectId ..., and in a given period, what's the probability of conflict?


  • 进程ID

  • 计数器

发生冲突的可能性。进程ID在整个周期内可能保持不变,而另一个数字只是一个递增的数字,它将在4095个订单后重复出现。但是,如果流程循环使用,那么您还有机会与较早的订单发生冲突,等等。而且,如果您正在使用多个数据库客户端,则机会也会增加。我只是不想削减这个数字。

The probability of conflict is high. The process id may remain the same through the entire period, and the other number is just an incrementing number that would repeat after 4095 orders. But, if the process recycles, then you also have the chance that there will be a conflict with older orders, etc. And if you're talking multiple database clients, the chances increase as well. I just wouldn't try to trim the number. It's not worth the unhappy customers trying to place orders.

当有多个数据库客户端生成 ObjectIds 。当您开始查看各个部分时,尤其是在数据库客户端服务器场的情况下,您应该了解为什么存在这些部分,以及为什么删除它们会导致 ObjectId 生成。

Even the timestamp and the random seed value aren't sufficient when there are multiple database clients generating ObjectIds. As you start to look at the various pieces, especially in the context of a farm of database clients, you should see why the pieces are there, and why removing them could lead to a meltdown in ObjectId generation.

我建议您实现一种算法,以创建唯一编号并将其存储在数据库中。这很简单。它确实会影响性能,但是很安全。

I'd suggest you implement an algorithm to create a unique number and store it in the database. It's simple enough to do. It does impact performance a bit, but it's safe.

我写了 this 稍早回答有关在Url中使用 ObjectId 的挑战。它包含一个如何使用MongoDB创建唯一的自动递增编号的链接。

I wrote this answer a while ago about the challenges of using an ObjectId in a Url. It includes a link to how to create a unique auto incrementing number using MongoDB.

这篇关于mongodb objectid的一部分很可能是唯一的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆