如何确定一个二进制字符串的统计随机性? [英] How can I determine the statistical randomness of a binary string?

查看:158
本文介绍了如何确定一个二进制字符串的统计随机性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何确定一个二进制字符串的统计随机性?

How can I determine the statistical randomness of a binary string?

人机工程学,怎么能I C我自己的测试$ C $,并返回对应于统计随机性一个值,在0至1.0的值(即0不是随机的,1.0是随机的)?

Ergo, how can I code my own test, and return a single value that corresponds to the statistical randomness, a value between 0 and 1.0 (0 being not random, 1.0 being random)?

该测试将需要处理任何规模的二进制字符串。

The test would need to work on binary strings of any size.

当你用钢笔和纸做的,你可能会探讨这样的字符串:
   0(任意随机性,唯一的选择是1)
   00(不是随机的,它的重复和尺寸相匹配)
   01(好,两个不同的值)
   010(随机的,回文)
   011(随机的,多1的,还是可以接受的)
   0101(随机的,图案)
   0100(更好的,少的,但是任何其他分销使图案)

When you do it with pen and paper, you might explore strings like this:
  0 (arbitrary randomness, the only other choice is 1)
  00 (not random, its a repeat and matches the size)
  01 (better, two different values)
  010 (less random, palindrome)
  011 (less random, more 1's, still acceptable)
  0101 (less random, pattern)
  0100 (better, less ones, but any other distribution causes patterns)

案例:

尺寸:1,可能性:2
   0:1.0(随机)
   1:1.0(随机)

Size: 1, Possibilities: 2
  0: 1.0 (random)
  1: 1.0 (random)

尺寸:2,P:4
   00:
   01:1.0(随机)
   10:1.0(随机)
   11:

Size: 2, P:4
  00: ?
  01: 1.0 (random)
  10: 1.0 (random)
  11: ?

S:3,P:8
   000:?非随机
   001:1.0(随机)
   010:?随机的
   011:1.0(随机)
   100:1.0(随机)
   101:?随机的
   110 1.0(随机)
   111:?非随机

S:3, P:8
  000: ? non-random
  001: 1.0 (random)
  010: ? less random
  011: 1.0 (random)
  100: 1.0 (random)
  101: ? less random
  110 1.0 (random)
  111: ? non-random

等。

And so on.

我觉得这可能会打很多进入破串入的所有可能的子字符串和比较频率,但似乎这种基础工作应该已经在计算机科学的早期完成。

I feel that this may play a lot into breaking the string into all possible substrings and comparing frequencies, but it seems like this sort of groundwork should already have been done in the early days of computer science.

推荐答案

这会给你一个从0熵数为1.0:

This will give you an entropy count from 0 to 1.0:

您可能想尝试寻找到信息熵,这是熵的措施适用到的数据和信息。事实上,它实际上是几乎熵物理公式的直接模拟热力学中最接受的跨pretations定义。

You might want to try looking into the Shannon Entropy, which is a measure of entropy as applied to data and information. In fact, it is actually almost a direct analogue of Physical formula for entropy as defined by the most accepted interpretations of Thermodynamics.

更具体地讲,在你的情况下,一个二进制字符串,就可以看到 二元熵函数 < /一>,它是涉及随机性数据的二进制比特的一个特例。

More specifically, in your case, with a binary string, you can see the Binary Entropy Function, which is a special case involving randomness in binary bits of data.

这是由计算

H(p) = -p*log(p) - (1-p)*log(1-p)

(对数以基地2个;假设 0 *日志(0) 0)

其中, P 是百分比1(或0的,该图是对称的,所以你的答案是相同的两种方式)

Where p is your percentage of 1's (or of 0's; the graph is symmetrical, so your answer is the same either way)

下面是函数收益率是什么:

Here is what the function yields:

正如你所看到的,如果 P 0.5(1的为0的相同金额),您的信息熵处于最大(1.0)。如果 P 0或1.0,熵为0。

As you can see, if p is 0.5 (same amount of 1's as 0's), your entropy is at the maximum (1.0). If p is 0 or 1.0, the entropy is 0.

这似乎正是你想要的,对吧?

This appears to be just what you want, right?

唯一的例外是您的尺寸1 的情况下,这可能只是把作为例外。然而,100%,0和100%1的似乎并没有太熵给我。但实现它们,只要你愿意。

The only exception is your Size 1 cases, which could just be put as an exception. However, 100% 0's and 100% 1's don't seem too entropic to me. But implement them as you'd like.

此外,这并没有考虑到的比特任何订购。仅总和他们。因此,重复/回文不会得到任何提振。您可能要添加一个额外的启发式这一点。

Also, this does not take into account any "ordering" of the bits. Only the sum total of them. So, repetition/palindromes won't get any boost. You might want to add an extra heuristic for this.

下面是你的其他情况下的例子:

Here are your other case examples:


00:   -0*log(0) - (1-0)*log(1-0)               = 0.0
01:   -0.5*log(0.5) - (1-0.5)*log(1-0.5)       = 1.0
010:  -(1/3)*log(1/3) - (2/3)*log(2/3)         = 0.92
0100: -0.25*log(0.25) - (1-0.25)*log(1-0.25)   = 0.81

这篇关于如何确定一个二进制字符串的统计随机性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆