在生成.NET的所有整数随机,非重复序列 [英] Generating a random, non-repeating sequence of all integers in .NET

查看:159
本文介绍了在生成.NET的所有整数随机,非重复序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有.NET的方式产生的全部 32位整数(的Int32 )以随机顺序,无需重复序列和在一个存储器有效的方式? 。内存效率将意味着最多可使用主内存只有几百兆字节



在理想情况下的顺序应该是这样的一个的IEnumerable< INT> ,它懒洋洋地返回序列中,要求只有当



我做了一些快速的研究,我发现了一些局部的解决方案,下一个号码。这样:




  • 使用最大线性反馈移位寄存器 - 如果我理解正确的,它只是在递增序列生成的数字,并没有覆盖整个范围

  • 使用的费雪耶茨或其它洗牌算法对集合 - 这将违反内存限制给出的大范围的

  • 维护一套般的收集和保存产生一个随机整数(可能使用随机),直到它不会重复,也就是说,它不是在设置 - 除了可能不能满足对内存的需求,它将获得可笑的慢时产生序列中的最后一个数字。

  • 超过32位的随机排列,但我不能想办法,以确保非重复性。



有另一种方式来看待这个问题 - 也许走的是固定范围值的优势 - 这将给予满足存储需求的解决方案?也许.NET类库来的东西有用吗?



更新1



感谢大家对你的洞察力和创造性的建议的解决方案。我会尝试尽快实施和测试(均为正确性和记忆效率)在这里提出的2或3个最有前途的解决方案,发布结果,然后选择一个胜利者。



更新2



我试图实施HVD的中的下面评论。我试图同时使用 BitArray .NET和我的定制实现,因为.NET一是仅限于 int.MaxValue 项,所以不足以支付整数的整个范围。



我喜欢这个主意的简单,我愿意牺牲的512 MB的内存,如果它工作得很好。不幸的是,运行时速度很慢,花费达几十秒钟生成我的机器,它有一个3.5 GHz的酷睿i7 CPU上的下一个随机数。所以很遗憾,如果你要求要产生很多很多的随机数,这是不能接受的。我想这是可预见的,虽然,这是一个O(M×N个)的算法,如果我没有记错,其中N为2 ^ 32,M为要求整数的数量,因此所有这些迭代产生一定的影响。



在理想情况下,我想生成O(1)时间的下一个随机数,同时还能满足对内存的需求,也许下一个算法这里建议适合于这一点。我会尽快,我可以给他们一个尝试。



更新3



我刚刚测试过的的Linear同余发生器的,我可以说我很满意的结果。它看起来像在这个线程的赢家位置的有力争夺者。



正确性:生成的只出现一次的所有整数(我用了一下向量。检查)



随机性:还算不错



内存使用:太好了,只有几个字节



运行时间:生成一个随机整数非常快的,因为你可以期待从O(1)算法。生成每个整数共有约拿了。 11秒我的机器上。



总之,我会说这是一个非常合适的技术,如果你不是在找高度随机的序列。



更新4



模块化的下面介绍475673分之35103941>乘法逆技术的行为非常类似于LCG技术 - 并不奇怪,因为两者都基于模运算 - ,但我发现它有点不太明了,以产生令人满意随机实施序列。



一个有趣的差异是,我发现这种方法似乎比LCG快:生成的整个序列了约8秒,与11秒LCG。除此之外,关于内存的效率,准确性和随机性其他所有的言论都是一样的。



更新5



看起来像用户 TomTom公司删除其与梅森难题的答案,恕不另行通知后,我在一家中指出评论说,我发现它产生比要求的重复号码越快。所以我想这完全排除了梅森倍捻机。



更新6



我测试了另一种建议的技术,看起来很有希望, Skip32 ,而我真正喜欢的随机数的质量,算法不适合用于产生在时间上可接受的量的整数的整个范围。所以很遗憾相比,能够完成过程中的其他技术,当它落在短。我用C#的实现从这里,顺便说一句 - 我改变了代码数量减少轮1的,但它仍然不能及时完成。



毕竟,由上述结果来看,对于溶液我个人的选择进到 模块化乘法逆 的技术,其次紧密的的linear~~V同余发生器。有些人可能会认为这是在某些方面与其他技术逊色,但考虑到我原来的限制,我会说这最适合他们。


解决方案

有没有.NET的方式




其实,这可以在大部分的语言完成




要生成所有的32位整数序列(Int32)已







在随机顺序,




在这里我们需要对术语同意,因为随机是不是大多数人认为它是。更详细的说明一下。




不重复,







和在存储效率的方式?







内存效率意味着使用最多的只有几百兆字节主存储器中。




好了,所以就用几乎没有记忆是可以接受的? ; - )



得到的建议之前,我们需要澄清的随机性的问题。东西是真正的随机没有明显的模式。因此,在一排运行算法百万次可能的理论上的返回在所有迭代相同的值。如果在这个概念引发必须从现有迭代不同的,则它不再是随机的。然而,看着都要求一起,似乎一切真的是被要求的是不同的整数分布格局。这是可行的。



那么如何有效地做到这一点?利用模块化乘法逆。我用这个来回答以下问题其中有一个类似的要求,产生非重复的,一定范围内的伪随机,样本数据:



生成在给定的时间间隔不同的随机时间



我第一次听说这个概念在这里(的生成的SQL Server 看似随意的唯一数字ID),你可以使用下面的在线计算器,以确定您的整数和模块化乘法逆(MMI)值:





这里运用这一概念,你会使用Int32.MaxSize作为模值。



这将使随机分布的一个明确的外观没有机会碰撞和无需记忆存储已使用的值。



唯一最初的问题是,分布的图案总是一样给予相同的整数和MMI值。所以,你能想出不同的模式通过添加一个随机生成的诠释为初始值(我相信我在有关生成SQL Server中的样本数据回答一样),也可以预先生成若干组合整型和相应的MMI的价值观,这些存储在配置文件/字典,并使用.NET随机函数选一个在每次运行的开始。即使你存储100组合,几乎是没有内存使用(假设它不是一个配置文件)。事实上,如果存储都为INT和字典使用诠释为指标,那么1000值约为12K?






更新



注:




  • 有一个在结果模式,但它无法辨别,除非你有足够他们在任何时候看总。对于大部分的用例,这是可以接受的,因为没有值的收件人将有一个大的集合当中,或者知道他们被按顺序分配没有任何间隙(以及知识是必需的,以便确定是否存在一个图案)

  • 只有两个变量值的1 - 整数和模反元素(MMI) - 是必要的公式为特定的奔跑在。因此:


    • 每一对给出了两个不同的序列

    • 如果保持在内存中的一组,只需要一个简单的数组,并假设该阵列的索引是仅仅从阵列的基地址在存储器中的偏移,则所需的存储器应该只有4字节*容量(即1024选项是仅4k的,是吗?)




下面是一些测试代码。这是写在T-SQL的Microsoft SQL Server,因为这是我主要的工作,而且它还具有使它真正容易样测试的唯一性,最小值和最大值等,而无需编写任何的优势。语法将在SQL Server 2008或更新的工作。对于SQL Server 2005,变量初始化尚未出台又那么每个定义包含一个 = 只会需要本身和 SET @Variable = ... 然而,对于该变量被分成了定义正在初始化。和 SET @index + = 1; 将需要成为设置@index = @index + 1;



如果您对供应产生任何重复值测试代码会报错。和最终的查询指示,如果有,因为它可以推断,如果表变量人口没有错误(因此无重复)的任何间隙,的值的总数目是预期的数目,则存在只能是空白(即缺失值)如果实际的最小值和最大值的一个或两个是预期值的外面。



请注意,这个测试的代码不暗示任何值都预先产生或需要存储。该代码只是为了测试唯一性和最大/最小值存储的值。在实践中,所有需要的是一个简单的公式,所有需要传递到它:




  • 的能力(尽管这还可以硬编码在这种情况下)

  • 的MMI /整型值

  • 目前的指数



所以,你只需要保持2 - 3简单的值



<预类=郎-SQL prettyprint,覆盖> 定义@TotalCapacity INT = 30; - 模; -5〜+ 4 = 10或Int32.MinValue
- 要Int32.MaxValue =(UInt32.MaxValue + 1)
DECLARE @MMI INT = 7; - 模反元素(MMI)或
- 整数(从@TotalCapacity派生)

DECLARE @Offset INT = 0; - 必须保持为0,如果最小值和最大值是硬集
-----------
DECLARE @index INT =(1 + @Offset); - 启动

DECLARE @EnsureUnique表([ORDERNUM] INT NOT NULL IDENTITY(1,1),
[超值] INT NOT NULL UNIQUE);
SET NOCOUNT ON;

BEGIN TRY
在(@index≤(@TotalCapacity + 1 + @Offset)) - 范围+ 1
BEGIN
INSERT INTO @EnsureUnique([值])值(
((@index * @MMI)%@TotalCapacity) - (@TotalCapacity / 2)+ @Offset $ b $二);
设置@index + = 1;
端;
端TRY
BEGIN CATCH
DECLARE @Error NVARCHAR(4000)= ERROR_MESSAGE();
RAISERROR(@Error,16,1);
返回;
END CATCH;

SELECT * FROM @EnsureUnique ORDER BY [ORDERNUM] ASC;
SELECT COUNT(*)AS [TotalValues],
@TotalCapacity AS [ExpectedCapacity],
MIN([超值])AS [MINVALUE],
(@TotalCapacity / -2 )AS [ExpectedMinValue],
MAX([超值])AS [MaxValue的],
(@TotalCapacity / 2) - 1 AS [ExpectedMaxValue]
从@EnsureUnique;


Is there a way in .NET to generate a sequence of all the 32-bit integers (Int32) in random order, without repetitions, and in a memory-efficient manner? Memory-efficient would mean using a maximum of just a few hundred mega bytes of main memory.

Ideally the sequence should be something like an IEnumerable<int>, and it lazily returns the next number in sequence, only when requested.

I did some quick research and I found some partial solutions to this:

  • Using a maximal linear feedback shift register - if I understood correctly, it only generates the numbers in increasing sequence and it does not cover the entire range
  • Using the Fisher-Yates or other shuffling algorithms over collections - this would violate the memory restrictions given the large range
  • Maintaining a set-like collection and keep generating a random integer (perhaps using Random) until it doesn't repeat, i.e. it's not in the set - apart from possibly failing to satisfy the memory requirements, it would get ridiculously slow when generating the last numbers in the sequence.
  • Random permutations over 32 bits, however I can't think of a way to ensure non-repeatability.

Is there another way to look at this problem - perhaps taking advantage of the fixed range of values - that would give a solution satisfying the memory requirements? Maybe the .NET class libraries come with something useful?

UPDATE 1

Thanks everyone for your insights and creative suggestions for a solution. I'll try to implement and test soon (both for correctness and memory efficiency) the 2 or 3 most promising solutions proposed here, post the results and then pick a "winner".

UPDATE 2

I tried implementing hvd's suggestion in the comment below. I tried using both the BitArray in .NET and my custom implementation, since the .NET one is limited to int.MaxValue entries, so not enough to cover the entire range of integers.

I liked the simplicity of the idea and i was willing to "sacrifice" those 512 MB of memory if it worked fine. Unfortunately, the run time is quite slow, spending up to tens of seconds to generate the next random number on my machine, which has a 3.5 GHz Core i7 CPU. So unfortunately this is not acceptable if you request many, many random numbers to be generated. I guess it's predictable though, it's a O(M x N) algorithm if i'm not mistaken, where N is 2^32 and M is the number of requested integers, so all those iterations take their toll.

Ideally i'd like to generate the next random number in O(1) time, while still meeting the memory requirements, maybe the next algorithms suggested here are suited for this. I'll give them a try as soon as I can.

UPDATE 3

I just tested the Linear Congruential Generator and i can say i'm quite pleased with the results. It looks like a strong contender for the winner position in this thread.

Correctness: all integers generated exactly once (i used a bit vector to check this).

Randomness: fairly good.

Memory usage: Excellent, just a few bytes.

Run time: Generates the next random integer very fast, as you can expect from an O(1) algorithm. Generating every integer took a total of approx. 11 seconds on my machine.

All in all i'd say it's a very appropriate technique if you're not looking for highly randomized sequences.

UPDATE 4

The modular multiplicative inverse technique described below behaves quite similarly to the LCG technique - not surprising, since both are based on modular arithmetic -, although i found it a bit less straightforward to implement in order to yield satisfactorily random sequences.

One interesting difference I found is that this technique seems faster than LCG: generating the entire sequence took about 8 seconds, versus 11 seconds for LCG. Other than this, all other remarks about memory efficiency, correctness and randomness are the same.

UPDATE 5

Looks like user TomTom deleted their answer with the Mersenne Twister without notice after i pointed out in a comment that i found out that it generates duplicate numbers sooner than required. So i guess this rules out the Mersenne Twister entirely.

UPDATE 6

I tested another suggested technique that looks promising, Skip32, and while i really liked the quality of the random numbers, the algorithm is not suitable for generating the entire range of integers in an acceptable amount of time. So unfortunately it falls short when compared to the other techniques that were able to finish the process. I used the implementation in C# from here, by the way - i changed the code to reduce the number of rounds to 1 but it still can't finish in a timely manner.

After all, judging by the results described above, my personal choice for the solution goes to the modular multiplicative inverses technique, followed very closely by the linear congruential generator. Some may argue that this is inferior in certain aspects to other techniques, but given my original constraints i'd say it fits them best.

解决方案

Is there a way in .NET

Actually, this can be done in most any language

to generate a sequence of all the 32-bit integers (Int32)

Yes.

in random order,

Here we need to agree on terminology since "random" is not what most people think it is. More on this in a moment.

without repetitions,

Yes.

and in a memory-efficient manner?

Yes.

Memory-efficient would mean using a maximum of just a few hundred mega bytes of main memory.

Ok, so would using almost no memory be acceptable? ;-)

Before getting to the suggestion, we need to clear up the matter of "randomness". Something that is truly random has no discernible pattern. Hence, running the algorithm millions of times in a row could theoretically return the same value across all iterations. If you throw in the concept of "must be different from the prior iteration", then it is no longer random. However, looking at all of the requirements together, it seems that all that is really being asked for is "differing patterns of distribution of the integers". And this is doable.

So how to do this efficiently? Make use of Modular multiplicative inverses. I used this to answer the following Question which had a similar requirement to generate non-repeating, pseudo-random, sample data within certain bounds:

Generate different random time in the given interval

I first learned about this concept here ( generate seemingly random unique numeric ID in SQL Server ) and you can use either of the following online calculators to determine your "Integer" and "Modular Multiplicative Inverses (MMI)" values:

Applying that concept here, you would use Int32.MaxSize as the Modulo value.

This would give a definite appearance of random distribution with no chance for collisions and no memory needed to store already used values.

The only initial problem is that the pattern of distribution is always the same given the same "Integer" and "MMI" values. So, you could come up with differing patterns by either adding a "randomly" generated Int to the starting value (as I believe I did in my answer about generating the sample data in SQL Server) or you can pre-generate several combinations of "Integer" and corresponding "MMI" values, store those in a config file / dictionary, and use a .NET random function to pick one at the start of each run. Even if you store 100 combinations, that is almost no memory use (assuming it is not in a config file). In fact, if storing both as Int and the dictionary uses Int as an index, then 1000 values is approximately 12k?


UPDATE

Notes:

  • There is a pattern in the results, but it is not discernible unless you have enough of them at any given moment to look at in total. For most use-cases, this is acceptable since no recipient of the values would have a large collection of them, or know that they were assigned in sequence without any gaps (and that knowledge is required in order to determine if there is a pattern).
  • Only 1 of the two variable values -- "Integer" and "Modular Multiplicative Inverse (MMI)" -- is needed in the formula for a particular run. Hence:
    • each pair gives two distinct sequences
    • if maintaining a set in memory, only a simple array is needed, and assuming that the array index is merely an offset in memory from the base address of the array, then the memory required should only be 4 bytes * capacity (i.e. 1024 options is only 4k, right?)

Here is some test code. It is written in T-SQL for Microsoft SQL Server since that is where I work primarily, and it also has the advantage of making it real easy-like to test for uniqueness, min and max values, etc, without needing to compile anything. The syntax will work in SQL Server 2008 or newer. For SQL Server 2005, initialization of variables had not been introduced yet so each DECLARE that contains an = would merely need to be separated into the DECLARE by itself and a SET @Variable = ... for however that variable is being initialized. And the SET @Index += 1; would need to become SET @Index = @Index + 1;.

The test code will error if you supply values that produce any duplicates. And the final query indicates if there are any gaps since it can be inferred that if the table variable population did not error (hence no duplicates), and the total number of values is the expected number, then there could only be gaps (i.e. missing values) IF either or both of the actual MIN and MAX values are outside of the expected values.

PLEASE NOTE that this test code does not imply that any of the values are pre-generated or need to be stored. The code only stores the values in order to test for uniqueness and min / max values. In practice, all that is needed is the simple formula, and all that is needed to pass into it is:

  • the capacity (though that could also be hard-coded in this case)
  • the MMI / Integer value
  • the current "index"

So you only need to maintain 2 - 3 simple values.

DECLARE @TotalCapacity INT = 30; -- Modulo; -5 to +4 = 10 OR Int32.MinValue
                                 -- to Int32.MaxValue = (UInt32.MaxValue + 1)
DECLARE @MMI INT = 7; -- Modular Multiplicative Inverse (MMI) or
                      -- Integer (derived from @TotalCapacity)

DECLARE @Offset INT = 0; -- needs to stay at 0 if min and max values are hard-set
-----------
DECLARE @Index INT = (1 + @Offset); -- start

DECLARE @EnsureUnique TABLE ([OrderNum] INT NOT NULL IDENTITY(1, 1),
                             [Value] INT NOT NULL UNIQUE);
SET NOCOUNT ON;

BEGIN TRY
    WHILE (@Index < (@TotalCapacity + 1 + @Offset)) -- range + 1
    BEGIN
        INSERT INTO @EnsureUnique ([Value]) VALUES (
                 ((@Index * @MMI) % @TotalCapacity) - (@TotalCapacity / 2) + @Offset
                                                   );
        SET @Index += 1;
    END;
END TRY
BEGIN CATCH
    DECLARE @Error NVARCHAR(4000) = ERROR_MESSAGE();
    RAISERROR(@Error, 16, 1);
    RETURN;
END CATCH;

SELECT * FROM @EnsureUnique ORDER BY [OrderNum] ASC;
SELECT COUNT(*) AS [TotalValues],
       @TotalCapacity AS [ExpectedCapacity],
       MIN([Value]) AS [MinValue],
       (@TotalCapacity / -2) AS [ExpectedMinValue],
       MAX([Value]) AS [MaxValue],
       (@TotalCapacity / 2) - 1 AS [ExpectedMaxValue]
FROM   @EnsureUnique;

这篇关于在生成.NET的所有整数随机,非重复序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆