如何计算的文件的熵? [英] How to calculate the entropy of a file?
问题描述
如何计算一个文件的熵? (或让我们只说了一堆字节)的
我有一个想法,但我不知道它的数学是正确的。
我的想法是这样的:
- 创建256个整数(全0)的数组。
- 通过文件和它的每个字节导线,
递增阵列中的相应位置。 - 在年底:计算平均值数组 。
- 在初始化为零的计数器,
并为每个阵列的条目:
添加条目的区别 以平均到柜台。
好了,现在我卡住了。如何项目的计数器结果以这样一种方式 所有的结果将在0.0和1.0之间何在?但我敢肯定, 这个想法是不一致反正...
我希望有人有更好更简单的办法呢?
注:我需要整个事情对文件的内容作出假设:
(明文,标记,COM pressed或一些二进制,...)的
- 在年底:计算平均值数组
。- 在初始化一个计数器为零, 并为每个阵列的条目: 添加条目的区别平均到柜台。
使用的部分的修改,就可以得到香农熵:
重命名平均到熵
(浮点)熵= 0
因为我数组[256]在:计数做
(浮点)P =计数[我] /文件大小
如果(对大于0)熵=熵 - P * LG(对)// LGN是与基体2的对数
编辑: 至于韦斯利提到的,我们必须要在范围内的 0调整其划分熵8。 。 1 (或者,我们可以使用对数基256)。
How to calculate the entropy of a file? (Or let's just say a bunch of bytes)
I have an idea, but I'm not sure that it's mathematically correct.
My idea is the following:
- Create an array of 256 integers (all zeros).
- Traverse through the file and for each of its bytes,
increment the corresponding position in the array. - At the end: Calculate the "average" value for the array.
- Initialize a counter with zero,
and for each of the array's entries:
add the entry's difference to "average" to the counter.
Well, now I'm stuck. How to "project" the counter result in such a way that all results would lie between 0.0 and 1.0? But I'm sure, the idea is inconsistent anyway...
I hope someone has better and simpler solutions?
Note: I need the whole thing to make assumptions on the file's contents:
(plaintext, markup, compressed or some binary, ...)
- At the end: Calculate the "average" value for the array.
- Initialize a counter with zero, and for each of the array's entries: add the entry's difference to "average" to the counter.
With some modifications you can get Shannon's entropy:
rename "average" to "entropy"
(float) entropy = 0
for i in the array[256]:Counts do
(float)p = Counts[i] / filesize
if (p > 0) entropy = entropy - p*lg(p) // lgN is the logarithm with base 2
Edit: As Wesley mentioned, we must divide entropy by 8 in order to adjust it in the range 0 . . 1 (or alternatively, we can use the logarithmic base 256).
这篇关于如何计算的文件的熵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!