lempel-ziv压缩算法的实现 [英] lempel-ziv compression algorithm implemention
问题描述
我为以下示例字符串编写了以下代码以实现lempel-ziv压缩算法:
I wrote the following code for implementing lempel-ziv compression algorithm for the following sample string:
AAAABBCDEABCDABCAAABCDEEEEEECBBBBBBDDAAE
代码:
keys=[]
text = open('test').read() # contain of the string: AAAABBCDEABCDABCAAABCDEEEEEECBBBBBBDDAAE
index=0
t=time.time()
def sub_strin_check(text,index_num):
n = 1
while True:
substring = text[index_num:index_num+n]
if substring not in keys :
print(substring)
keys.append(substring)
# print(keys[-1])
return (index_num+n)
else:
n = n+1
continue
while True:
try:
if text[index] not in keys:
# print (index)
keys.append(text[index])
print(keys.append(text[index]),text[index])
except:
break
else:
try:
index = sub_strin_check(text,index)
print(index)
print(index)
index = index + 1
except:
break
res = str(keys)
print(res)
with open("result","w") as f:
f.write(res)
但结果是:
['A', 'A', 'AA', 'AB', 'C', 'C', 'CD', 'ABC', 'ABCA', 'ABCD', 'E', 'E', 'EE', 'EEC', 'B', 'B', 'BB', 'BBD', 'AAE']
我的想法是在字符串(文本)中使用索引号,并检查切成薄片的子字符串是否在键字典或螺母中退出(如果不存在),将其添加.如果存在,请继续并通过添加下一个字符来检查子字符串.
My idea is working with index number in string(text) and check if the substring that is sliced exit in keys dictionary or nut if it's not there add it. If it exists continue and check the substring by adding the next character.
有任何帮助请看看我的错误在哪里?
Any help please to see where is my mistake?
PS:我知道互联网上有一些lempel-ziv python代码,但是我必须使用此代码.
PS: I know there are some lempel-ziv python code on internet, but I have to use this code.
PPS:
lempel ziv算法以这种方式工作.如果未在键(字典)中退出,则检查给定字符串中的第一个字符;如果在检查字符串中的下一个字符时退出,则检查该字符串;如果不退出,则检查此新的子字符串;如果不退出,则添加子字符串;如果在键中退出,则添加下一个字符,此过程继续...例如,对于我的字符串,输出应为:[A,AA,AB,B,C,D,ABC,AAA,BC,DE,E,EE,EEE,CB,BB,BBB,DD,AAE]
PPS:
The lempel ziv algorithm works in this way. checks the first character in the given string if it not exit in the keys (dictionary) if it exits in the checks for the next character in the string and checks this new substring if it not exit adds substring and if exits in keys it add the next character and this process continue...for example for my string the output should be: [A,AA,AB,B,C,D,ABC,AAA,BC,DE,E,EE,EEE,CB,BB,BBB,DD,AAE]
推荐答案
我将使用字典而不是列表进行查找.然后,如果需要,可以从字典直接转换为列表
I would use the dictionary instead of a list for lookup. Then the conversion from dictionary to list is straight forward if required
input_str = 'AAAABBCDEABCDABCAAABCDEEEEEECBBBBBBDDAAE'
keys_dict = {}
ind = 0
inc = 1
while True:
if not (len(input_str) >= ind+inc):
break
sub_str = input_str[ind:ind + inc]
print sub_str,ind,inc
if sub_str in keys_dict:
inc += 1
else:
keys_dict[sub_str] = 0
ind += inc
inc = 1
# print 'Adding %s' %sub_str
print list(keys_dict)
输出:
['A', 'AA', 'C', 'B', 'E', 'D', 'BB', 'BC', 'BCD', 'BBB', 'ABC', 'DA', 'EE', 'AB', 'EC', 'BD', 'EEE', 'AAA', 'DAA']
算法参考: https://www.youtube.com/watch?v=EgreLYa-81Y
这篇关于lempel-ziv压缩算法的实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!