大型分布式系统中ObjectId与UUID的碰撞概率 [英] Collision probability of ObjectId vs UUID in a large distributed system

查看:804
本文介绍了大型分布式系统中ObjectId与UUID的碰撞概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑到UUID rfc 4122(16字节)比MongoDB ObjectId(12字节)大得多,我试图找出它们的碰撞概率如何比较.

Considering that an UUID rfc 4122 (16 bytes) is much larger than a MongoDB ObjectId (12 bytes), I am trying to find out how their collision probability compare.

我知道这很可能不太可能,但是就我而言,大多数ID都是在大量移动客户端中生成的,而不是在一组有限的服务器中生成的. 我想知道在这种情况下是否存在合理的担忧.

I know that is something around quite unlikely, but in my case most ids will be generated within a large number of mobile clients, not within a limited set of servers. I wonder if in this case, there is a justified concern.

与所有ID由少数客户端生成的正常情况相比:

Compared to the normal case where all ids are generated by a small number of clients:

  • 自文档创建以来,可能要花费数月的时间才能检测到冲突
  • ID是从更大的客户群中产生的
  • 每个客户端的ID生成率较低

推荐答案

在我的情况下,大多数ID将在大量的移动客户端中生成,而不是在一组有限的服务器中生成.我想知道在这种情况下是否存在合理的担忧.

in my case most ids will be generated within a large number of mobile clients, not within a limited set of servers. I wonder if in this case, there is a justified concern.

对我来说,这听起来像是很糟糕的体系结构.您使用的是两层体系结构吗?为什么移动客户端可以直接访问数据库?您真的要依靠基于网络的安全性吗?

That sounds like very bad architecture to me. Are you using a two-tier architecture? Why would the mobile clients have direct access to the db? Do you really want to rely on network-based security?

无论如何,关于碰撞概率的一些思考:

Anyway, some deliberations about the collision probability:

UUID和ObjectId都不依赖于它们的绝对大小,即两者都不是随机数,但是它们遵循试图系统地降低冲突概率的方案.对于 ObjectId,其结构为:

Neither UUID nor ObjectId rely on their sheer size, i.e. both are not random numbers, but they follow a scheme that tries to systematically reduce collision probability. In case of ObjectIds, their structure is:

    从Unix时代起
  • 4字节秒数
  • 3字节机器ID
  • 2个字节的进程ID
  • 3字节计数器
  • 4 byte seconds since unix epoch
  • 3 byte machine id
  • 2 byte process id
  • 3 byte counter

这意味着,与UUID相反,ObjectId是单调的(在一秒钟内除外),这可能是它们最重要的属性.单调索引将使B树的填充效率更高,它允许按id进行分页,并允许按id进行默认排序"以使光标稳定,当然,它们带有易于提取的时间戳.这些是您应该意识到的优化,它们可能是巨大的.

This means that, contrary to UUIDs, ObjectIds are monotonic (except within a single second), which is probably their most important property. Monotonic indexes will cause the B-Tree to be filled more efficiently, it allows paging by id and allows a 'default sort' by id to make your cursors stable, and of course, they carry an easy-to-extract timestamp. These are the optimizations you should be aware of, and they can be huge.

从其他3个组件的结构中可以看出,如果您在单个进程中执行> 1k次插入/秒(实际上是不可能的,甚至不是来自服务器),冲突的可能性就很大.的计算机超过10个(请参阅生日问题),或者一台计算机上的进程数过大(然后又不是随机数,但它们在计算机上确实是唯一的,但必须缩短)到两个字节).

As you can see from the structure of the other 3 components, collisions become very likely if you're doing > 1k inserts/s on a single process (not really possible, not even from a server), or if the number of machines grows past about 10 (see birthday problem), or if the number of processes on a single machine grows too large (then again, those aren't random numbers, but they are truly unique on a machine, but they must be shortened to two bytes).

自然地,要发生冲突,它们必须在所有这些方面都匹配.因此,即使两台机器具有相同的机器哈希,它仍然需要客户端使用相同的计数器插入值具有完全相同的秒数和相同的进程ID,但是是的,这些值可能会发生冲突.

Naturally, for a collision to occur, they must match in all these aspects, so even if two machines have the same machine hash, it'd still require a client to insert with the same counter value in the exact same second and the same process id, but yes, these values could collide.

这篇关于大型分布式系统中ObjectId与UUID的碰撞概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆