分布式序列号生成? [英] Distributed sequence number generation?

查看:39
本文介绍了分布式序列号生成?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

过去我通常使用数据库序列实现序列号生成.

例如使用 Postgres SERIAL 类型 http://www.neilconway.org/docs/sequences/>

我很好奇如何为没有数据库的大型分布式系统生成序列号.是否有人对以线程安全方式为多个客户端实现序列号生成的最佳实践有任何经验或建议?

解决方案

好的,这是一个很老的问题,我现在第一次看到.

您需要区分序列号唯一ID,它们(可选)可以根据特定条件(通常是生成时间)进行松散排序.真正的序列号意味着知道所有其他工人做了什么,因此需要共享状态.以分布式、大规模的方式做到这一点并不容易.您可以查看网络广播、每个工作器的窗口范围以及 distributed唯一工人 ID 的哈希表,但需要大量工作.

唯一 ID 是另一回事,有几种以分散方式生成唯一 ID 的好方法:

a) 您可以使用 Twitter 的 Snowflake ID 网络服务.强>雪花是:

  • 网络服务,即您通过网络调用获取唯一 ID;
  • 生成按生成时间排序的 64 位唯一 ID;
  • 并且该服务具有高度的可扩展性和(潜在的)高度可用性;每个实例每秒可以生成数千个 ID,并且您可以在 LAN/WAN 上运行多个实例;
  • 用 Scala 编写,在 JVM 上运行.

b) 您可以使用源自 howUUID 和 Snowflake 的 ID 已制作. 有多种选择,但大致如下:

  • 最重要的 40 位左右:时间戳; ID 的生成时间.(我们使用时间戳的最高有效位使 ID 可按生成时间排序.)

  • 接下来的 14 位左右:每个生成器的计数器,每个生成器每生成一个新 ID,它就会递增 1.这可确保在同一时刻(相同时间戳)生成的 ID 不会重叠.

  • 最后 10 位左右:每个生成器的唯一值. 使用它,我们不需要在生成器之间进行任何同步(这非常困难),因为所有由于此值,生成器会生成不重叠的 ID.

c) 您可以在客户端上生成 ID,只需使用时间戳和随机值.这避免了了解所有生成器并为每个生成器分配唯一值的需要.另一方面,此类 ID 不能保证在全球范围内唯一,它们只是极有可能是唯一的.(要发生碰撞,一个或多个生成器必须在完全相同的时间创建相同的随机值.)大致如下:

  • 最高 32 位:时间戳,ID 的生成时间.
  • 最低有效 32 位:32 位随机性,为每个 ID 重新生成.

d) 简单的方法,使用 UUID/GUID.

I've generally implemented sequence number generation using database sequences in the past.

e.g. Using Postgres SERIAL type http://www.neilconway.org/docs/sequences/

I'm curious though as how to generate sequence numbers for large distributed systems where there is no database. Does anybody have any experience or suggestions of a best practice for achieving sequence number generation in a thread safe manner for multiple clients?

解决方案

OK, this is a very old question, which I'm first seeing now.

You'll need to differentiate between sequence numbers and unique IDs that are (optionally) loosely sortable by a specific criteria (typically generation time). True sequence numbers imply knowledge of what all other workers have done, and as such require shared state. There is no easy way of doing this in a distributed, high-scale manner. You could look into things like network broadcasts, windowed ranges for each worker, and distributed hash tables for unique worker IDs, but it's a lot of work.

Unique IDs are another matter, there are several good ways of generating unique IDs in a decentralized manner:

a) You could use Twitter's Snowflake ID network service. Snowflake is a:

  • Networked service, i.e. you make a network call to get a unique ID;
  • which produces 64 bit unique IDs that are ordered by generation time;
  • and the service is highly scalable and (potentially) highly available; each instance can generate many thousand IDs per second, and you can run multiple instances on your LAN/WAN;
  • written in Scala, runs on the JVM.

b) You could generate the unique IDs on the clients themselves, using an approach derived from how UUIDs and Snowflake's IDs are made. There are multiple options, but something along the lines of:

  • The most significant 40 or so bits: A timestamp; the generation time of the ID. (We're using the most significant bits for the timestamp to make IDs sort-able by generation time.)

  • The next 14 or so bits: A per-generator counter, which each generator increments by one for each new ID generated. This ensures that IDs generated at the same moment (same timestamps) do not overlap.

  • The last 10 or so bits: A unique value for each generator. Using this, we don't need to do any synchronization between generators (which is extremely hard), as all generators produce non-overlapping IDs because of this value.

c) You could generate the IDs on the clients, using just a timestamp and random value. This avoids the need to know all generators, and assign each generator a unique value. On the flip side, such IDs are not guaranteed to be globally unique, they're only very highly likely to be unique. (To collide, one or more generators would have to create the same random value at the exact same time.) Something along the lines of:

  • The most significant 32 bits: Timestamp, the generation time of the ID.
  • The least significant 32 bits: 32-bits of randomness, generated anew for each ID.

d) The easy way out, use UUIDs / GUIDs.

这篇关于分布式序列号生成?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆