在没有预混洗的情况下生成(非常)大的非重复整数序列 [英] Generating (very) large non-repeating integer sequence without pre-shuffling

查看:143
本文介绍了在没有预混洗的情况下生成(非常)大的非重复整数序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景



我写了一个简单的媒体客户端/服务器,并且我想要生成一个非显而易见的时间值,客户端到服务器。时间戳会有一些相当数据(即使它不是真正准确的,由于现代操作系统中定时器采样的限制)等等,纳秒分辨率等等。

我试图做的(在Linux上,C语言)是生成一个n位值的一对一序列(让我们假设现在数据存储在128位的int数组元素中)没有重叠/碰撞值。然后,我会将一个伪随机128位值/数字作为盐,将其应用于时间戳,然后开始向服务器发送命令,增加预先腌制/预先哈希值。



时间戳大小的原因是因为时间戳可能需要适应非常长的时间。




问题



一个具有初始盐值的序列(非碰撞)? 根据我的目标发出的最佳方法是这个帖子,注意到

lockquote

如果选项1对您来说不够随机,则使用CRC-32所述
全局(32位)计数器的散列值。在
N位整数与它们的CRC-N之间有一对一的映射(bijection),所以唯一性将得到保证。

但是,我不知道:


  • 如果可以(有效地)扩展到128位数据。 / li>
  • 如果某种形式的加法/乘法 - 按盐值为序列提供初始种子会破坏它或引入冲突。
>




后续



我意识到我可以使用128位随机哈希从 libssl 或类似的东西,但我希望远程服务器,使用相同的盐值,能够将散列时间戳转换回它们的真实值。



谢谢。

解决方案

您可以使用线性同余发生器。使用正确的参数,它保证产生具有全部周期(即没有冲突)的非重复序列[独特]序列。



这就是 random(3)用于 TYPE_0 模式。我对它进行了调整,使其满足 unsigned int 范围,种子可以是任何 unsigned int (请参阅下面的示例代码) 。



我相信它可以扩展到64位或128位。我会看看: https://en.wikipedia.org/wiki/Linear_congruential_generator 以查看参数约束以防止冲突和良好的随机性。



遵循wiki页面指南,您可以制作一个可以带 128位值作为种子,直到生成所有可能的128位数字后才会重复。

您可能需要编写一个程序来生成合适的参数对,然后测试他们为最好的随机性。这是一次性操作。



获得它们后,只需在实际应用程序中将这些参数插入到您的公式中即可。






以下是我一直在玩的一些代码,当我寻找类似的东西时:

  // _prngstd  - 获得随机数
static inline u32
_prngstd(prng_p prng)
{
long rhs;
u32 lhs;

//注意:随机更快且有_long_期间,但是_only_产生
//正整数,但jrand48产生正和负数
#if 0
rhs = jrand48(btc-> btc_seed);
lhs = rhs;
#endif

//这有冲突
#if 0
rhs = rand();
PRNG_FLIP;
#endif

//这有碰撞,因为它默认为TYPE_3
#if 0
rhs = random();
PRNG_FLIP;
#endif

//这是TYPE_0(线性同余)模式中的随机数
#if 0
prng-> prng_state =((prng-> prng_state * 1103515245)+ 12345)& 0x7FFFFFFF的;
rhs = prng-> prng_state;
PRNG_FLIP;
#endif

//这是在TYPE_0(线性同余)模式中的随机掩码
//删除以获取全范围数字
//这样做不_not_产生重叠
#if 1
prng-> prng_state =((prng-> prng_state * 1103515245)+ 12345);
rhs = prng-> prng_state;
lhs = rhs;
#endif

return lhs;
}


Background

I have a simple media client/server I've written, and I want to generate a non-obvious time value I send with each command from the client to the server. The timestamps will have a fair bit of data in them (nano-second resolution, even if it's not truly accurate, due to limitations of timer sampling in modern operating systems), etc.

What I'm trying to do (on Linux, in C), is to generate a one-to-one sequence of n-bit values (let's assume data is store in 128bit array-of-int elements for now) with no overlapping/colliding values. I would then take a pseudo-random 128bit value/number as a "salt", apply it to the timestamp, and then start sending off commands to the server, incrementing the pre-salted/pre-hashed value.

The reason the timestamp size is so large is because the timestamp may have to accommodate a very large duration of time.


Question

How could I accomplish such a sequence (non-colliding) with an initial salt value? The best approach that sounds along the lines of my goal is from this post, which notes:

If option 1 isn't "random" enough for you, use the CRC-32 hash of said global (32-bit) counter. There is a 1-to-1 mapping (bijection) between N-bit integers and their CRC-N so uniqueness will still be guaranteed.

However, I do not know:

  • If that can (efficiently) be extended to 128-bit data.
  • If some sort of addition-to/multiplication-by salt-value to provide the initial seed for the sequence would disrupt it or introduce collisions.

Follow-up

I realize that I could use a 128bit random hash from libssl or something similar, but I want the remote server, using the same salt value, to be able to convert the hashed timestamps back into their true values.

Thank you.

解决方案

You could use a linear congruential generator. With the right parameters, it is guaranteed to produce non-repeating sequences [unique] sequences with a full period (i.e. no collisions).

This is what random(3) uses in TYPE_0 mode. I adapted it for a full unsigned int range and the seed can be any unsigned int (See my sample code below).

I believe it can be extended to 64 or 128 bits. I'd have a look at: https://en.wikipedia.org/wiki/Linear_congruential_generator to see about the constraints on parameters to prevent collisions and good randomness.

Following the wiki page guidelines, you could produce one that can take any 128 bit value as the seed and will not repeat until all possible 128 bit numbers have been generated.

You may need to write a program to generate suitable parameter pairs and then test them for the "best" randomness. This would be a one time operation.

Once you've got them, just plug these parameters into your equation in your actual application.


Here's some code of mine that I had been playing with when I was looking for something similar:

// _prngstd -- get random number
static inline u32
_prngstd(prng_p prng)
{
    long rhs;
    u32 lhs;

    // NOTE: random is faster and has a _long_ period, but it _only_ produces
    // positive integers but jrand48 produces positive _and_ negative
#if 0
    rhs = jrand48(btc->btc_seed);
    lhs = rhs;
#endif

    // this has collisions
#if 0
    rhs = rand();
    PRNG_FLIP;
#endif

    // this has collisions because it defaults to TYPE_3
#if 0
    rhs = random();
    PRNG_FLIP;
#endif

    // this is random in TYPE_0 (linear congruential) mode
#if 0
    prng->prng_state = ((prng->prng_state * 1103515245) + 12345) & 0x7fffffff;
    rhs = prng->prng_state;
    PRNG_FLIP;
#endif

    // this is random in TYPE_0 (linear congruential) mode with the mask
    // removed to get full range numbers
    // this does _not_ produce overlaps
#if 1
    prng->prng_state = ((prng->prng_state * 1103515245) + 12345);
    rhs = prng->prng_state;
    lhs = rhs;
#endif

    return lhs;
}

这篇关于在没有预混洗的情况下生成(非常)大的非重复整数序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆