如何有效地将几个字节转换为范围之间的整数? [英] How to efficiently convert a few bytes into an integer between a range?
问题描述
我正在写一些从远程随机数生成源读取字节(只是一个 List< int>
)的东西,它非常慢。为了这个和我的个人要求,我想从源中检索尽可能少的字节。
I'm writing something that reads bytes (just a List<int>
) from a remote random number generation source that is extremely slow. For that and my personal requirements, I want to retrieve as few bytes from the source as possible.
现在我正在尝试实现一个方法哪个签名看起来像:
Now I am trying to implement a method which signature looks like:
int getRandomInteger(int min, int max)
我有两种理论可以从随机源中获取字节,并将它们转换为整数。
I have two theories how I can fetch bytes from my random source, and convert them to an integer.
方法#1是天真的。获取(max - min)/ 256
字节数并将其相加。它可以工作,但它将从我的慢速随机数生成器源获取大量字节。例如,如果我想得到一个百万到零之间的随机整数,它将获取近4000字节...这是不可接受的。
Approach #1 is naivé . Fetch (max - min) / 256
number of bytes and add them up. It works, but it's going to fetch a lot of bytes from the slow random number generator source I have. For example, if I want to get a random integer between a million and a zero, it's going to fetch almost 4000 bytes... that's unacceptable.
方法#2声音对我来说很理想,但我无法想出算法。它是这样的:
Approach #2 sounds ideal to me, but I'm unable come up with the algorithm. it goes like this:
让我们以min:0,max:1000为例。
Lets take min: 0, max: 1000 as an example.
- 计算
ceil(rangeSize / 256)
在这种情况下,ceil(1000/256)= 4
。现在从源中获取一(1)个字节。 - 将这一个字节从0-255范围缩放到0-3范围(或1-4)并让它确定我们在哪个组中使用。例如。如果字节为250,我们将选择第4组(代表最后250个数字,在我们的范围内为750-1000)。
- 现在获取另一个字节并从0-255缩放到0-250并让它确定我们所拥有的组内的位置。所以如果第二个字节是例如120,那么我们的最终整数是
750 + 120 = 870
。
- Calculate
ceil(rangeSize / 256)
which in this case isceil(1000 / 256) = 4
. Now fetch one (1) byte from the source. - Scale this one byte from the 0-255 range to 0-3 range (or 1-4) and let it determine which group we use. E.g. if the byte was 250, we would choose the 4th group (which represents the last 250 numbers, 750-1000 in our range).
- Now fetch another byte and scale from 0-255 to 0-250 and let that determine the position within the group we have. So if this second byte is e.g. 120, then our final integer is
750 + 120 = 870
.
In那个场景我们只需要总共获取2个字节。然而,它要复杂得多,好像我们的范围是0-1000000,我们需要几个组。
In that scenario we only needed to fetch 2 bytes in total. However, it's much more complex as if our range is 0-1000000 we need several "groups".
我如何实现这样的东西?我对Java / C#/ JavaScript代码或伪代码没问题。
How do I implement something like this? I'm okay with Java/C#/JavaScript code or pseudo code.
我还希望保持结果不会丢失熵/随机性。所以,我有点担心缩放整数。
I'd also like to keep the result from not losing entropy/randomness. So, I'm slightly worried of scaling integers.
推荐答案
不幸的是你的方法#1坏了。例如,如果min为0且最大为510,则需要添加2个字节。只有一种方法可以获得0结果:两个字节都为零。这个机会是(1/256)^ 2。然而,有很多方法可以获得其他值,例如100 = 100 + 0,99 + 1,98 + 2 ......所以100的概率要大得多:101(1/256)^ 2.
Unfortunately your Approach #1 is broken. For example if min is 0 and max 510, you'd add 2 bytes. There is only one way to get a 0 result: both bytes zero. The chance of this is (1/256)^2. However there are many ways to get other values, say 100 = 100+0, 99+1, 98+2... So the chance of a 100 is much larger: 101(1/256)^2.
或多或少做标准的方法是:
The more-or-less standard way to do what you want is to:
Let R = max - min + 1 -- the number of possible random output values
Let N = 2^k >= mR, m>=1 -- a power of 2 at least as big as some multiple of R that you choose.
loop
b = a random integer in 0..N-1 formed from k random bits
while b >= mR -- reject b values that would bias the output
return min + floor(b/m)
这称为拒绝方法。它会丢弃随机选择的二进制数字,这会对输出产生偏差。如果 min-max + 1
恰好是2的幂,那么你将有零拒绝。
This is called the method of rejection. It throws away randomly selected binary numbers that would bias the output. If min-max+1
happens to be a power of 2, then you'll have zero rejections.
如果你有 m = 1
和 min-max + 1
只是2的一个大的力量,然后拒绝接近一半。在这种情况下,你肯定想要更大的 m
。
If you have m=1
and min-max+1
is just one more than a biggish power of 2, then rejections will be near half. In this case you'd definitely want bigger m
.
一般来说,较大的m值会导致较少的拒绝,但当然它们每个数字需要更多的位数。选择 m
是一种可能的最佳算法。
In general, bigger m values lead to fewer rejections, but of course they require slighly more bits per number. There is a probabilitistically optimal algorithm to pick m
.
此处介绍的其他一些解决方案存在问题,但我对不起,我现在没有时间发表评论。如果有兴趣,也许在几天内。
Some of the other solutions presented here have problems, but I'm sorry right now I don't have time to comment. Maybe in a couple of days if there is interest.
这篇关于如何有效地将几个字节转换为范围之间的整数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!