建议的解决方案:在分布式环境中生成唯一的ID [英] Proposed solution: Generate unique IDs in a distributed environment

查看:52
本文介绍了建议的解决方案:在分布式环境中生成唯一的ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在浏览网络,试图找到一种解决方案,使我们能够在区域分布的环境中生成唯一的ID。

I've been browsing the net trying to find a solution that will allow us to generate unique IDs in a regionally distributed environment.

我查看了以下选项(以及其他):

I looked at the following options (among others):

雪花(通过Twitter)


  • 这似乎是一个不错的解决方案,但我只是不喜欢必须管理另一个软件才能创建ID所带来的额外复杂性;

  • 此阶段它缺少文档,因此我认为这不是一个很好的投资;

  • 节点需要能够使用Zookeeper相互通信(延迟/通信失败是什么?)

UUID


  • 看看: 550e8400-e29b-41d4-a716-446655440000

  • 它是一个128位ID;

  • 有发生了一些已知的冲突(取决于我猜的版本)请参阅这篇文章

  • Just look at it: 550e8400-e29b-41d4-a716-446655440000;
  • Its a 128 bit ID;
  • There has been some known collisions (depending on the version I guess) see this post.

关系数据库中的自动添加(如MYSQL)


  • 这似乎很安全,但不幸的是,我们没有使用关系数据库(可伸缩性首选项);

  • 我们可以像Flickr一样为此部署MySQL服务器,但是同样,这又引入了另一个故障点/瓶颈。也增加了复杂性。

非关系数据库中的自动增加,例如库克


  • 这可以工作,因为我们使用Couchbase作为我们的数据库服务器,但是;

  • 当我们拥有超过一个位于不同区域的群集,延迟问题,网络故障:在某些时候,ID会根据通信量发生冲突;

让我们说我们有由10个Couchbase节点和10个Couchbase节点组成的集群5个不同地区(非洲,欧洲,亚洲,美洲和大洋洲)的应用程序节点。这是为了确保从最靠近用户的位置提供内容(以提高速度),并确保在发生灾难等情况下实现冗余。

Lets say that we have clusters consisting of 10 Couchbase Nodes and 10 Application nodes in 5 different regions (Africa, Europe, Asia, America and Oceania). This is to ensure that content is served from a location closest to the user (to boost speed) and to ensure redundancy in case of disasters etc.

现在,任务是生成不会在复制(和平衡)发生时发生冲突的ID,我认为这可以通过3个步骤实现:

Now, the task is to generate IDs that wont collide when the replication (and balancing) occurs and I think this can be achieved in 3 steps:

步骤1

将为所有区域分配整数ID(唯一标识符):

All regions will be assigned integer IDs (unique identifiers):


  • 1-非洲;

  • 2-美国;

  • 3-亚洲;

  • 4-欧洲;

  • 5-大洋洲。

  • 1 - Africa;
  • 2 - America;
  • 3 - Asia;
  • 4 - Europe;
  • 5 - Ociania.

第2步

为添加到群集中的每个应用程序节点分配一个ID,请记住,一个群集中最多可能有99 999台服务器(即使我怀疑:出于安全起见)。看起来像这样(假IP):

Assign an ID to every Application node that is added to the cluster keeping in mind that there may be up to 99 999 servers in one cluster (even though I doubt: just as a safely precaution). This will look something like this (fake IPs):


  • 00001-192.187.22.14

  • 00002 -164.254.58.22

  • 00003-142.77.22.45

  • 依此类推。

  • 00001 - 192.187.22.14
  • 00002 - 164.254.58.22
  • 00003 - 142.77.22.45
  • and so forth.

请注意,所有这些都在同一群集中,因此这意味着每个区域都可以拥有节点00001。

Please note that all of these are in the same cluster, so that means you can have node 00001 per region.

步骤3

对于插入到数据库中的每条记录,将使用递增的ID来标识它,这就是它的工作方式:

For every record inserted into the database, an incremented ID will be used to identify it, and this is how it will work:

Couchbase提供了增量功能,可用于在群集内部内部创建ID。为确保冗余,将在群集内创建3个副本。由于它们位于同一位置,因此我认为可以肯定地假设,除非整个群集都关闭,否则负责此操作的节点之一将可用,否则可以增加许多副本。

Couchbase offers an increment feature that we can use to create IDs internally within the cluster. To ensure redundancy, 3 replicas will be created within the cluster. Since these are in the same place, I think it should be safe to assume that unless the whole cluster is down, one of the nodes responsible for this will be available, otherwise a number of replicas can be increased.

将它们组合在一起

说一个用户正在从欧洲注册:
该应用程序服务请求的节点将获取区域代码(在这种情况下为 4 ),获得其自己的ID(例如, 00005 ),然后获得一个递增的ID( 1 )(来自同一群集)。

Say a user is signing up from Europe: The application node serving the request will grab the region code (4 in this case), get its own ID (say 00005) and then get an incremented ID (1) from Couchbase (from the same cluster).

我们最终得到3个组成部分: 4,00005,1 。现在,要从中创建ID,我们只需将这些组件加入 4.00005.1 。为了使其变得更好(我不太确定),我们可以连接(不添加它们)以得到最终的组件: 4000051

We end up with 3 components: 4, 00005,1. Now, to create an ID from this, we can just join these components into 4.00005.1. To make it even better (I'm not too sure about this), we can concatenate (not add them up) the components to end up with: 4000051.

在代码中,看起来像这样:

In code, this will look something like this:

$ id ='4'。'00005'。'1';

NB:不是 $ id = 4 + 00005 + 1;

专业人士


  • ID看起来比UUID好;

  • 它们看起来足够独特。即使另一个区域中的一个节点生成了与上面相同的增量ID,并且具有与上述相同的节点ID,我们仍然始终具有将它们分开的区域代码;

  • 它们仍然可以存储作为整数(可能是大无符号整数);

  • 这都是体系结构的一部分,没有增加的复杂性。

  • IDs look better than UUIDs;
  • They seem unique enough. Even if a node in another region generated the same incremented ID and has the same node ID as the one above, we always have the region code to set them apart;
  • They can still be stored as integers (probably Big Unsigned integers);
  • It's all part of the architecture, no added complexities.

缺点


  • 没有排序(或没有排序)吗?

  • 这是我最需要您输入的地方

  • No sorting (or is there)?
  • This is where I need your input (most)

我知道每个解决方案都有缺陷,而且可能比我们在表面上看到的还要多。您能发现整个方法的任何问题吗?

I know that every solution has flaws, and possibly more that what we see on the surface. Can you spot any issues with this whole approach?

在此先感谢您的帮助:-)

Thank you in advance for your help :-)

编辑

根据@DaveRandom的建议,我们可以添加第4步:

As @DaveRandom suggested, we can add the 4th step:

第4步

我们可以生成一个随机数并将其附加到ID上以防止可预测性。实际上,您最终得到的是这样的东西:

We can just generate a random number and append it to the ID to prevent predictability. Effectively, you end up with something like this:

4000051357 而不只是 4000051

推荐答案

我认为这看起来很可靠。每个区域都保持一致性,如果使用XDCR,则不会发生冲突。 INCR在集群中是原子的,因此您在那里不会有任何问题。您实际上并不需要其中包含机器代码。如果某个区域内的所有应用服务器都连接到同一集群,则不需将其00001部分添加为前缀。如果出于其他原因(某种分析)对您有用,那么就一定可以,但这不是必须的。

I think this looks pretty solid. Each region maintains consistency, and if you use XDCR there are no collisions. INCR is atomic within a cluster, so you will have no issues there. You don't actually need to have the Machine code part of it. If all the app servers within a region are connected to the same cluster, it's irrelevant to infix the 00001 part of it. If that is useful for you for other reasons (some sort of analytics) then by all means, but it isn't necessary.

因此它可以简单地为'4' 。 1’(使用您的示例)

So it can simply be '4' . 1' (using your example)

能否给我一个示例,说明您需要哪种排序?

Can you give me an example of what kind of "sorting" you need?

第一:添加熵的一个缺点(我不确定为什么会需要它)是无法迭代ID集合

First: One downside of adding entropy (and I am not sure why you would need it), is you cannot iterate over the ID collection as easily.

例如:如果您的ID是1到100,您可以通过对Counter键进行简单的GET查询来知道,您可以按组分配任务,此任务需要1-10,接下来的11-20,依此类推,工作人员可以并行执行。如果添加熵,则将需要使用映射/缩小视图将集合下拉,因此您将失去键值模式的好处。

For Example: If you ID's from 1-100, which you will know from a simple GET query on the Counter key, you could assign tasks by group, this task takes 1-10, the next 11-20 and so on, and workers can execute in parallel. If you add entropy, you will need to use a Map/Reduce View to pull the collections down, so you are losing the benefit of a key-value pattern.

第二:由于您关注可读性,因此添加文档/对象类型标识符也很有价值,并且可以在地图/缩小视图中使用(或者您可以使用json键来标识

Second: Since you are concerned with readability, it can be valuable to add a document/object type identifier as well, and this can be used in Map/Reduce Views (or you can use a json key to identify that).

例如:'u:'。 ‘4’。 ‘1’

Ex: 'u:' . '4' . '1'

如果您是从外部引用ID,则可能希望以其他方式掩盖。如果您需要一个示例,请告诉我,我可以在答案中附加您可以做的事情。

If you are referring to ID's externally, you might want to obscure in other ways. If you need an example, let me know and I can append my answer with something you could do.

@ scalabl3

这篇关于建议的解决方案:在分布式环境中生成唯一的ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆