lempel-ziv压缩算法的实现 [英] lempel-ziv compression algorithm implemention

查看:417
本文介绍了lempel-ziv压缩算法的实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我为以下示例字符串编写了以下代码以实现lempel-ziv压缩算法:

I wrote the following code for implementing lempel-ziv compression algorithm for the following sample string:

AAAABBCDEABCDABCAAABCDEEEEEECBBBBBBDDAAE

代码:

keys=[]
text = open('test').read() # contain of the string: AAAABBCDEABCDABCAAABCDEEEEEECBBBBBBDDAAE
index=0
t=time.time()

def sub_strin_check(text,index_num):
    n = 1
    while True:
        substring = text[index_num:index_num+n]
        if substring not in keys :
            print(substring)
            keys.append(substring)

            # print(keys[-1])
            return (index_num+n)
        else:
            n = n+1
            continue

while True:
    try:
        if text[index] not in keys:
            # print (index)
            keys.append(text[index])

            print(keys.append(text[index]),text[index])
    except:
        break
    else:
        try:
            index = sub_strin_check(text,index)
            print(index)

            print(index)
            index = index + 1
        except:
            break

res = str(keys)
print(res)

with open("result","w") as f:
        f.write(res)

但结果是:

['A', 'A', 'AA', 'AB', 'C', 'C', 'CD', 'ABC', 'ABCA', 'ABCD', 'E', 'E', 'EE', 'EEC', 'B', 'B', 'BB', 'BBD', 'AAE']

我的想法是在字符串(文本)中使用索引号,并检查切成薄片的子字符串是否在键字典或螺母中退出(如果不存在),将其添加.如果存在,请继续并通过添加下一个字符来检查子字符串.

My idea is working with index number in string(text) and check if the substring that is sliced exit in keys dictionary or nut if it's not there add it. If it exists continue and check the substring by adding the next character.

有任何帮助请看看我的错误在哪里?

Any help please to see where is my mistake?

PS:我知道互联网上有一些lempel-ziv python代码,但是我必须使用此代码.

PS: I know there are some lempel-ziv python code on internet, but I have to use this code.

PPS: lempel ziv算法以这种方式工作.如果未在键(字典)中退出,则检查给定字符串中的第一个字符;如果在检查字符串中的下一个字符时退出,则检查该字符串;如果不退出,则检查此新的子字符串;如果不退出,则添加子字符串;如果在键中退出,则添加下一个字符,此过程继续...例如,对于我的字符串,输出应为:[A,AA,AB,B,C,D,ABC,AAA,BC,DE,E,EE,EEE,CB,BB,BBB,DD,AAE]

PPS: The lempel ziv algorithm works in this way. checks the first character in the given string if it not exit in the keys (dictionary) if it exits in the checks for the next character in the string and checks this new substring if it not exit adds substring and if exits in keys it add the next character and this process continue...for example for my string the output should be: [A,AA,AB,B,C,D,ABC,AAA,BC,DE,E,EE,EEE,CB,BB,BBB,DD,AAE]

推荐答案

我将使用字典而不是列表进行查找.然后,如果需要,可以从字典直接转换为列表

I would use the dictionary instead of a list for lookup. Then the conversion from dictionary to list is straight forward if required

input_str = 'AAAABBCDEABCDABCAAABCDEEEEEECBBBBBBDDAAE'

keys_dict = {}

ind = 0
inc = 1
while True:
    if not (len(input_str) >= ind+inc):
        break
    sub_str = input_str[ind:ind + inc]
    print sub_str,ind,inc
    if sub_str in keys_dict:
        inc += 1
    else:
        keys_dict[sub_str] = 0
        ind += inc
        inc = 1
        # print 'Adding %s' %sub_str

print list(keys_dict)

输出:

['A', 'AA', 'C', 'B', 'E', 'D', 'BB', 'BC', 'BCD', 'BBB', 'ABC', 'DA', 'EE', 'AB', 'EC', 'BD', 'EEE', 'AAA', 'DAA']

算法参考: https://www.youtube.com/watch?v=EgreLYa-81Y

这篇关于lempel-ziv压缩算法的实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆