你对以这种方式砍掉 type-4 UUID 有什么看法 [英] What do you say of chopping type-4 UUID in this manner

查看:28
本文介绍了你对以这种方式砍掉 type-4 UUID 有什么看法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

检查这个,

    List<String> list = new ArrayList<String>();
    for (int i = 0; i < 10000; i++) {
        String value = (""+UUID.randomUUID().getLeastSignificantBits()).substring(3, 20);
        assertFalse(list.contains(value));
        assertTrue(value.length() < 18);
        list.add(value);
    }

这个方法就像魅力一样传递.我的印象是,取最不重要的位比最重要的位要好一些.因为在最重要的位中,您有 6 位固定用于某些信息,而最不重要的情况并非如此.因此,平均而言,我们需要生成 2^29 个 UUID 才能与最高有效位发生冲突,但需要生成 2^32 个与最低有效位的冲突.参考:SO 线程.我的假设是否正确?

This method is passing like charm. And I am in the impression that it is slightly better to take least significant bits, rather than the most significant. Because in the most significant bits you have 6 bits fixed for some info, and with least significant its not the case. Hence, on average we need to generate 2^29 UUIDs to get a collision with most significant bits, but 2^32 many with least significant bits. Ref: SO Thread. Am I right in assuming that?

现在,我在这里将我从该方法中获得的最低有效位的 2 个最高有效位砍掉.我正在使用子字符串.请注意,我正在截断 2 位数字和一个符号位.这是否意味着现在我们平均需要生成 2^31 个 UUID 才能发生碰撞?

Now, here I am chopping 2 more most significant digits of the least significant bits I got from the method. I am using substring to that. Notice I am chopping 2 digits and a sign bit off. Does that not mean that now on average we need to generate 2^31 UUIDs to get a collision?

确切地说,我正在尝试生成一个长度不应超过 17 位的唯一标识符.并且它必须是一个整数,而不是Java类型意义上的.我的方法有多可靠?

Precisely, I am trying to generate a unique identifier which should not exceed 17 digit length. And it must be an integer, not in a sense of Java type. How reliable is my approach?

元信息:

实际上,我们正在与一些遗留系统集成,我们必须提供一些不超过 17 位的唯一编号.我假设他们将它作为数据库唯一键.在这种情况下我们也可以使用序列,这是我首先提出的.但是他们对我说,如果我能想出一个随机数来代替它,那就太好了,这样消费者就猜不出来了.

Actually, we are integrating with some legacy system, and we must provide some unique number not more than 17 digits. They are having it as a database unique key, I assume. We can also use sequence in this case, and I proposed that in the first place. But they said to me that its good if I can come up with a random number instead, so consumer can't guess.

据我所知,关于 Java 中 UUID 的 type-4 实现,我们平均需要生成 2^61 个 UUID 才能发生碰撞.这不意味着我们需要生成 2^32 才能在最低有效位上发生冲突,而 2^29 才能在最高有效位上发生冲突吗?如果是,那么假设我们需要平均生成 2^31 才能在截断 2 个最左边的数字后在最低有效位上发生冲突是不正确的吗?

As far as I know regarding the type-4 implementation of UUID in Java we need to generate 2^61 UUIDs on average to get a collision. Does that not means that we need to generate 2^32 to get the collision on least significant bits, and 2^29 to get the collision on most significant bits? If yes then is it not correct to assume that we need to generate on average 2^31 to get the collision on least significant bits after chopping of 2 left most digits?

我也尝试使用 SecureRandom,但这也给了我 19 位长的值.因此,我最终也先砍到了那个数字.下面是代码.

I tried to use SecureRandom as well, but that is also giving me 19 digits long value. Hence I end up chopping first to digits of that too. Below is the code for that.

    List<String> list = new ArrayList();
    Random random = new SecureRandom();
    for (int i = 0; i < 10000; i++) {
        String value = ""+random.nextLong().substring(2, 19);
        assertFalse(list.contains(value));
        assertTrue(value.length() < 18);
        list.add(value);
    }

我能想到的另一个选项是以yyMMddHHmmssSSS+2-seq-digits"格式使用日期.但我想这将非常依赖于处理器,并且可以猜测.因为我不太确定我在 99 轮后得到了毫秒的变化.可能我会,但这取决于处理器速度.99 个同时请求是不太可能的.

The other option I can think of is to use date in a format "yyMMddHHmmssSSS+2-seq-digits". But that would be quite processor dependent, and guessable, I suppose. Because I am not quite certain that I got a change in millisecond after 99 rounds. May be I will, but that would depend on processor speed. 99 simultaneous requests are quite unlikely though.

推荐答案

我建议您使用 Random 或 SecureRandom 来生成随机位并将它们转换为数字.那应该更便携.

I suggest that you use either Random or SecureRandom to generate random bits and turn those into a number. That should be more portable.

我不明白你关于砍数字的观点.假设您从长周期 PRNG 的足够位生成一个 17(十进制)数字,对于任何给定的生成数字对,您应该有 10**17 次碰撞的机会.如果源是好的,并且您使用了足够的位,那么您是否在砍"并不重要......

I don't understand your point about chopping digits. Assuming that you are generating a 17 (decimal) digit number from enough bits from a long-cycle PRNG you should have a chance of 1 in 10**17 of a collision for any given generated pair of numbers. If the source is good, and you use enough bits then it is immaterial that you are "chopping" ...

我不清楚 10**17 中的 1 是否足够好.这取决于在任何给定时间(在您的持久存储中)将存在多少个数字.例如,如果您有 4400 万个数字,那么至少一对之间发生冲突的几率约为 1%.

It is not clear to me that 1 in 10**17 is good enough. It depends on how many numbers are going to exist (in your persistent store) at any given time. For instance, if you have 44 million numbers extant, the chance of a collision between at least one pair will be about 1%.

尝试将一些数字插入生日悖论计算器.

我认为您需要的是一个生成器,它为您提供具有长循环长度的 64 位伪随机数,并且绝对保证不会重复您可能生成的更多数字.还必须能够保持生成器的状态并恢复它.然后,要获得一个 17 位十进制数字的随机"数,从生成器中获取下一个值并测试它是否在 0 ... 10**17 - 1 范围内.如果是,就使用它,如果不是重复.

I think what you need is a generator that gives you 64 bit pseudo random numbers with a long cycle length AND an absolute guarantee of no repetitions for more numbers than you could ever possibly generate. It must also be possible to persist the state of the generator and resume it. Then, to get a 17 decimal digit "random" number, get the next value from the generator and test if it is in the range 0 ... 10**17 - 1. If it is, use it, if not repeat.

如果您正确管理发电机,则在系统的整个生命周期内都不会重复,因此发生碰撞的风险为零.但重要的是您使用 PRNG(不是真正的 RNG)并选择具有正确属性的 PRNG.

If you manage the generator correctly, you never get a repetition for the lifetime of the system, and therefore there is zero risk of a collision. But it is important that you use a PRNG (not a true RNG) and that you pick a PRNG that has the right properties.

据我所知,Random 类提供了一个循环长度为 2**48 的 PRNG;即您应该在数字开始重复之前获得 2**48 数字(例如使用 getLong() 方法).OTOH,SecureRandom 提供具有非常长循环计数的真正随机或伪随机......但在每次调用时重复数字的机会很小但非零.

From what I can tell, the Random class offers a PRNG with a cycle length of 2**48; i.e. you should get 2**48 numbers (e.g. using the getLong() method) before the numbers start to repeat. OTOH, SecureRandom gives either truly random or pseudo-random with a very long cycle count ... but with a small but non-zero chance of repeating a number on each call.

这篇关于你对以这种方式砍掉 type-4 UUID 有什么看法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆