我需要帮助来分析这种编程技术,COM preSS数组 [英] I need help to analyze this programming technique to compress an array

查看:127
本文介绍了我需要帮助来分析这种编程技术,COM preSS数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望读者都知道香农信息论它说有一个事件的概率为p相关的信息内容(一)是-log的(P(A))。在外行人而言,如果你需要重新一批在0-7的范围内,那么你至少需要-log(1/8)present =日志(8)(其中基地2),即3位。

假设有整数范围从0到255而不是存储阵列为8位数字,我会按升序排列数组排序第一(保持备份ofcourse)的数组。而不是每一个数组元素编码为8位整数我会输出它的位置排序的数组中开始。现在的问题是让德codeR或接收器知道这个排序的数组。我会输出第一个(至少)整数为8位的数字,则此增量值被添加到这个数,并很快。首先排序后的数组接着元素的顺序的,即该位置值

例如:原来的阵列 - > 231,3,45,0,23,32,78

排序阵列 - > 0,3,23,32,45,78,231

在连接codeD信息是0,那么3(排序的数组为8位NUM的第一个元素)(这是增量大于0),那么20,然后到9,然后13,然后33,然后153。

发送第一号和连续的增量后,我会即发令,因为有7个整数,在这里,我将需要量级的三位数字,3(在原数组的0位置)3则1(位置23),那么4(位)的32,然后5(位)的45,然后2(位)的78则6(位置)231到0(位置)。

即位置值现在3,1,4,5,2,6,0

分析,看看这个计划将COM preSS:

第一号 - > 8位(它实际上可能需要较少的位,因为它是最小的)

接下来的6个号码 - > 5位(问题是,我们可以连接code 0,3,20,9,13与5位,但没有33和153,我们可能要连接code为31 (最多5位))

3位

7的位置每名─> 21位

total-> 8 + 6 * 5 + 21 = 59。这超过了56位,我们会要求EN code 7的数字,每行8位,我们已经获得了扩张比COM pression,我们的计划是有损耗的,因为一些大的数字,我们一直没能再present proplerly。

让我们增加一些复杂性,这种方案。

我会的EN code紧跟在$ C $下的最后一个号码231第一个0为8位数字,我便打发code 3的一个增量超过0,那么$ C $下153递减超过231然后20,然后33,9,13

也就是我已经派人在不同的命令 - >而不是0,3,20,9,13,33,153我会送的3,153,20,33,9,13

我所得到通过,这是连续缩小动态范围,你观察到,我们已经派出0,那么231则3则153这个时间值的范围内降低我的意思是下一个增量3,这将是20不能大于倒数第二个数字,即78和20的数量不能超越75(如果去那么第三个数字(3 + 76(说))将大于78显然违反了我们的排序假设。

如果你理解这个想法到现在我有一个进一步的改进方案,以使用二进制搜索的想法,进一步降低了动态范围,并把这种技术类固醇。 这里是排序的数组

0,3,23,32,45,78,231

观察到排序的数组是有7个数字,中间的是32。所以,现在我们将连接code这32 8位,然后我们将发送增量在preorder。即一个号码后,32将3这将是连接codeD为29(即32-3)和下一个将是78恩codeD为46(78-32),然后0 EN codeD为3(3-0),那么23 EN codeD为20(23-3),然后45 EN codeD为33(78-45),那么最后一个231 EN codeD为153(231- 78)。

如果你看现在我们可以决定多少位为每个号码逐个在这里使用的情况。

我们将要发送的排序阵列32(取值范围为0-255,以便8位),29(范围0-32使6位),46(范围32-255所以8位),3(范围0-3所以2比特),20(范围3-32所以5比特),33(范围32-78所以6比特),153(范围78-255这样8位)

所以总共8 + 6 + 8 + 2 + 5 + 6 + 8 = 43,其是非有损和超过38我们的初始估计(8位+ 5 * 6比特),以便此加入的7位值三比特每个共43 + 21 = 64,超过56本方案还在不断扩大。

我们可以做的这是21位的位置编号哪些方面需要改进。由于我们每次发送位置信息的位置数减一,如果我们有7个位置,然后发送比特数为log(7)+日志(6)+日志(5)。这是再登录(事实(7))位,其中所有的对数是基2。

注意,我是用公式日志(一)+日志(B)=日志(AB)

这等于12.299当与43加等于55.299,比56稍微较低,但这个是不实际的。我们需要至少3(范围7)+3(范围6)+3(范围5)+2(系列4)+2(范围3)+1(范围2)+0(范围1)= 14,加时43给57上的扩张。

此努力的目标是实现在数据大小的至少1比特减少。如果我们COM preSS 56位到55没有关于数据的任何假设,然后我们就可以利用55位和COM $ P $输出PSS再次到54位,并很快。这看起来是不可能的,这个想法是类似于永续机。现在的任务是要看到什么从阻止我们COM pressing更多。

我需要分析采取更大的阵列的示例,以查看是否43位的有序数组的可以是较小的比43还什么是分裂的阵列成许多部分和单独编码每个部分的优点。另外一个目标是找到什么公式来计算重新present有序数组所需的比特数。即给定一个数组大小的数组元素和范围如何找到像数字43。

让我们再次采取这一3,1,4,5,2,6,0作为一个排序的数组,可观察到,这种序列是5040的0 7个号码排列到6,我们可以重新present这作为一个13位的数字(12.299的理论认为)。

我需要知道的是有可能的COM preSS此阵,甚至更多。

解决方案
  

这一工作的目的是实现在数据中的至少1比特减少   大小。

这是不可能对所有的输入。您可以浪费了大量的努力,试图正确地计算各种重presentations位,犯错误,解决这些问题,等等,当你真正需要做的是怎么算许多情况下,

有2 ^ķ可能的输入,其中k是输入比特数。比方说,你认为你有一个K-1位重新每一个输入presentation。然后有2 ^(K-1)可再presentations。如果你给的2 ^(K-1)重presentations中的每一个你DECOM pressor然后,你会明显只得到2 ^(K-1)的结果。其他2 ^(k-1个)可能的输入中缺少动作。有没有办法来从你重新presentation,这意味着事实上你再presentation不能涵盖所有可能的2 ^ k个输入这些失踪的投入。至少有一半的人不包括在内。

I hope readers are aware of shannon's information theory which says that information content associated with an event a with probability p(a) is -log(p(a)). In layman terms if you need to represent a number in the range of 0-7 then you require at least -log(1/8)=log(8) (where base is 2) ie 3 bits.

Suppose there is an array of integers ranging from 0 to 255. Instead of storing the array as 8 bit numbers i will sort the array in ascending order first (keeping a backup ofcourse). Instead of encoding every array element as an 8 bit integer i will output its position in the sorted array. Now the problem is to let the decoder or receiver know this sorted array. I will output the first(least) integer value as an 8 bit number,then the increment to be added to this number and soon. First all of the sorted array followed by the order of the elements i.e the position values.

Ex: original array-> 231 , 3 , 45 , 0 , 23 , 32 , 78

sorted array-> 0,3,23,32,45,78,231

the encoded info is 0(the first element of sorted array as 8 bit num) then 3(this is increment over 0) then 20 then 9 then 13, then 33 then 153.

after sending the first number and successive deltas i will send the order i.e since there are 7 integers here i will need a three bit number for the order, 3(the position of 0 in original array) then 1(position of 3) then 4(position of 23)then 5(position of 32) then 2(position of 45) then 6(position of 78) then 0(position of 231).

i.e the position values are now 3 , 1 , 4 , 5 , 2 , 6 , 0

Analysis to see if this scheme will compress:

first number-> 8 bits (it may actually require less bits since it is the smallest)

next 6 numbers -> 5 bits( the problem is we can encode 0,3,20,9,13 with 5 bits but not 33 and 153 which we might have to encode as 31(maximum for 5 bits))

7 positions of 3 bits each->21 bits

total-> 8+6*5+21=59. which is more than the 56 bits we would have required to encode 7 numbers of 8 bits each, and we have achieved expansion than compression and our scheme is lossy since some large numbers we have not been able to represent proplerly.

Let us add some complexity to this scheme.

I will encode the first 0 as 8 bit number immediately followed by the code for the last number 231. Then i will send code for 3 the next increment over 0 then code for 153 the decrement over 231 then 20 then 33, 9,13

ie i have sent in different order-> instead of 0,3,20,9,13,33,153 i will send as 3,153,20,33,9,13

what i get by this is successive reduction in dynamic range you observe that we have sent 0 then 231 then 3 then 153 by this time the range of values reduces i mean the next increment to 3 that will be 20 cannot be larger than the second last number ie 78 and the number 20 cannot go beyond 75( if it goes then the third number(3+76(say)) will be greater than 78 clearly violation of our sorting assumption.

If you have understood the idea till now i have a further improved scheme to use binary search idea to further reduce the dynamic range and put this technique on steroids. Here is the sorted array

0 , 3 , 23 , 32 , 45 , 78 , 231

observe that the sorted array is having 7 numbers and the middle one is 32. So now we will encode this 32 with 8 bits then we will send the deltas in preorder. ie next number after 32 will be 3 which will be encoded as 29( ie 32-3) and next one will be 78 encoded as 46(78-32), then 0 encoded as 3(3-0) then 23 encoded as 20(23-3) then 45 encoded as 33(78-45) then the last one 231 encoded as 153(231-78).

If you now see we can decide how many bits to use for each number here on a case by case basis.

we will be sending the sorted array as 32(range 0-255 so 8 bits),29(range 0-32 so 6 bits),46(range 32-255 so 8 bits),3(range 0-3 so 2 bits) ,20(range 3-32 so 5 bits),33(range 32-78 so 6 bits),153(range 78-255 so 8 bits)

so totally 8+6+8+2+5+6+8=43 which is non lossy and more than our initial estimate of 38( 8 bits + 5*6 bits) so this added with the 7 position values of three bits each totally 43+21=64 is more than 56. Our scheme is still expanding.

What improvement can we do to the position numbers which are 21 bits. Since every time we send position info the number of positions reduces by one if we have 7 positions to send then number of bits is log(7)+log(6)+log(5).... This is then log(fact(7)) bits where all logarithms are base 2.

Observe that i have used the formula log(a)+log(b)=log(ab)

This is equal to 12.299 which when added with 43 equals 55.299 which is a tad lower than 56. But this is not practical. We need at least 3(range 7)+3(range 6)+3(range 5)+2(range 4)+2(range 3)+1(range 2)+0(range 1)=14 which when added with 43 gives 57 which is expansion.

The goal of this effort is to achieve at least 1 bit reduction in data size. If we compress 56 bits into 55 without any assumptions about data then we can take the output of 55 bits and compress it again to 54 bits and soon. This looks impossible and the idea is similar to perpetual machines. The task now is to see what stops us from compressing more.

I need to analyze taking an example of a bigger array to see if 43 bits of the sorted array can be lesser than 43. Also what is the advantage of splitting an array into many parts and encoding each part seperately. Also a goal is to find what is the formula to calculate number of bits required to represent a sorted array. i.e given an array size and range of array elements how to find numbers like 43.

Lets take this 3,1,4,5,2,6,0 as an unsorted array again and observe that this sequence is one of 5040 permutations of seven numbers from 0 to 6. We can represent this as a 13 bit number(12.299 as theory suggests).

I need to know is it possible to compress this array even more.

解决方案

The goal of this effort is to achieve at least 1 bit reduction in data size.

That is not possible over all inputs. You can waste a great deal of effort in trying to properly count the bits in various representations, make mistakes, fix them, etc., when all you really need to do is count how many cases there are.

There 2^k possible inputs, where k is the number of bits in the input. Let's say you believe that you have a k-1 bit representation of every single input. Then there are 2^(k-1) possible representations. Then if you feed every single one of those 2^(k-1) representations to your decompressor, you will obviously only get 2^(k-1) results. The other 2^(k-1) possible inputs are missing in action. There is no way to generate those missing inputs from your representation, which means that in fact your representation cannot cover all of the possible 2^k inputs. At least half of them are not covered.

这篇关于我需要帮助来分析这种编程技术,COM preSS数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆