如何有效地将几个字节转换为范围之间的整数? [英] How to efficiently convert a few bytes into an integer between a range?

查看:109
本文介绍了如何有效地将几个字节转换为范围之间的整数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在写一些从远程随机数生成源读取字节(只是一个 List< int> )的东西,它非常慢。为了这个和我的个人要求,我想从源中检索尽可能少的字节

I'm writing something that reads bytes (just a List<int>) from a remote random number generation source that is extremely slow. For that and my personal requirements, I want to retrieve as few bytes from the source as possible.

现在我正在尝试实现一个方法哪个签名看起来像:

Now I am trying to implement a method which signature looks like:

int getRandomInteger(int min, int max)

我有两种理论可以从随机源中获取字节,并将它们转换为整数。

I have two theories how I can fetch bytes from my random source, and convert them to an integer.

方法#1是天真的。获取(max - min)/ 256 字节数并将其相加。它可以工作,但它将从我的慢速随机数生成器源获取大量字节。例如,如果我想得到一个百万到零之间的随机整数,它将获取近4000字节...这是不可接受的。

Approach #1 is naivé . Fetch (max - min) / 256 number of bytes and add them up. It works, but it's going to fetch a lot of bytes from the slow random number generator source I have. For example, if I want to get a random integer between a million and a zero, it's going to fetch almost 4000 bytes... that's unacceptable.

方法#2声音对我来说很理想,但我无法想出算法。它是这样的:

Approach #2 sounds ideal to me, but I'm unable come up with the algorithm. it goes like this:

让我们以min:0,max:1000为例。

Lets take min: 0, max: 1000 as an example.


  • 计算 ceil(rangeSize / 256)在这种情况下, ceil(1000/256)= 4 。现在从源中获取一(1)个字节。

  • 将这一个字节从0-255范围缩放到0-3范围(或1-4)并让它确定我们在哪个组中使用。例如。如果字节为250,我们将选择第4组(代表最后250个数字,在我们的范围内为750-1000)。

  • 现在获取另一个字节并从0-255缩放到0-250并让它确定我们所拥有的组内的位置。所以如果第二个字节是例如120,那么我们的最终整数是 750 + 120 = 870

  • Calculate ceil(rangeSize / 256) which in this case is ceil(1000 / 256) = 4. Now fetch one (1) byte from the source.
  • Scale this one byte from the 0-255 range to 0-3 range (or 1-4) and let it determine which group we use. E.g. if the byte was 250, we would choose the 4th group (which represents the last 250 numbers, 750-1000 in our range).
  • Now fetch another byte and scale from 0-255 to 0-250 and let that determine the position within the group we have. So if this second byte is e.g. 120, then our final integer is 750 + 120 = 870.

In那个场景我们只需要总共获取2个字节。然而,它要复杂得多,好像我们的范围是0-1000000,我们需要几个组。

In that scenario we only needed to fetch 2 bytes in total. However, it's much more complex as if our range is 0-1000000 we need several "groups".

我如何实现这样的东西?我对Java / C#/ JavaScript代码或伪代码没问题。

How do I implement something like this? I'm okay with Java/C#/JavaScript code or pseudo code.

我还希望保持结果不会丢失熵/随机性。所以,我有点担心缩放整数。

I'd also like to keep the result from not losing entropy/randomness. So, I'm slightly worried of scaling integers.

推荐答案

不幸的是你的方法#1坏了。例如,如果min为0且最大为510,则需要添加2个字节。只有一种方法可以获得0结果:两个字节都为零。这个机会是(1/256)^ 2。然而,有很多方法可以获得其他值,例如100 = 100 + 0,99 + 1,98 + 2 ......所以100的概率要大得多:101(1/256)^ 2.

Unfortunately your Approach #1 is broken. For example if min is 0 and max 510, you'd add 2 bytes. There is only one way to get a 0 result: both bytes zero. The chance of this is (1/256)^2. However there are many ways to get other values, say 100 = 100+0, 99+1, 98+2... So the chance of a 100 is much larger: 101(1/256)^2.

或多或少做标准的方法是:

The more-or-less standard way to do what you want is to:

Let R = max - min + 1   -- the number of possible random output values
Let N = 2^k >= mR, m>=1  -- a power of 2 at least as big as some multiple of R that you choose.
loop
   b = a random integer in 0..N-1 formed from k random bits
while b >= mR -- reject b values that would bias the output
return min + floor(b/m)

这称为拒绝方法。它会丢弃随机选择的二进制数字,这会对输出产生偏差。如果 min-max + 1 恰好是2的幂,那么你将有零拒绝。

This is called the method of rejection. It throws away randomly selected binary numbers that would bias the output. If min-max+1 happens to be a power of 2, then you'll have zero rejections.

如果你有 m = 1 min-max + 1 只是2的一个大的力量,然后拒绝接近一半。在这种情况下,你肯定想要更大的 m

If you have m=1 and min-max+1 is just one more than a biggish power of 2, then rejections will be near half. In this case you'd definitely want bigger m.

一般来说,较大的m值会导致较少的拒绝,但当然它们每个数字需要更多的位数。选择 m 是一种可能的最佳算法。

In general, bigger m values lead to fewer rejections, but of course they require slighly more bits per number. There is a probabilitistically optimal algorithm to pick m.

此处介绍的其他一些解决方案存在问题,但我对不起,我现在没有时间发表评论。如果有兴趣,也许在几天内。

Some of the other solutions presented here have problems, but I'm sorry right now I don't have time to comment. Maybe in a couple of days if there is interest.

这篇关于如何有效地将几个字节转换为范围之间的整数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆