如何使用Python将霍夫曼编码写入文件？ [英] How to write Huffman coding to a file using Python?

查看：180 发布时间：2020/10/7 2:21:56 python encoding compression huffman-code

本文介绍了如何使用Python将霍夫曼编码写入文件？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我创建了一个Python脚本来使用霍夫曼算法压缩文本。假设我有以下字符串：

I created a Python script to compress text by using the Huffman algorithm. Say I have the following string:

string = 'The quick brown fox jumps over the lazy dog'

运行我的算法会返回以下位：

Running my algorithm returns the following 'bits':

result = '01111100111010101111010011111010000000011000111000010111110111110010100110010011010100101111100011110001000110101100111101000010101101110110111000111010101110010111111110011000101101000110111000'

的比较结果
使用输入字符串，该算法似乎可以正常工作：

By comparing the amount of bits of the result with the input string, the algorithm seems to work:

>>> print len(result), len(string) * 8
194 344

但是现在来了问题：如何将其写入文件，同时仍然能够对其进行解码。您只能按字节而不是按位写入文件。通过将代码写为字节，根本就没有压缩！

But now comes the question: how do I write this to a file, while still being able to decode it. You can only write to a file per byte, not per bit. By writing the 'codes' as bytes, there is no compression at all!

我是计算机科学的新人，而在线资源对我来说并不有用。

I am new at computer science, and the online resources just don't cut it for me. All help is much appreciated!

编辑：请注意，我的代码是这样的（在另一个输入字符串<$ c $的情况下） c>'xxxxxxxyzz'）：

note that I had my codes something like this (in case of another input string 'xxxxxxxyzz'):

{'y': '00', 'x': '1', 'z': '10'}

我创建结果字符串的方式是通过按输入字符串的顺序串联这些代码：

The way I create the resulting string is by concatenating these codes in order of the input string:

result = '1111111001010'

如何从此结果返回到原始字符串？还是我完全错了？谢谢！

How to get back to the original string from this result? Or am I getting this completely wrong? Thank you!

推荐答案

首先，您需要将输入字符串转换为字节：

First you need to convert your input string to bytes:

def _to_Bytes(data):
  b = bytearray()
  for i in range(0, len(data), 8):
    b.append(int(data[i:i+8], 2))
  return bytes(b)

然后打开一个文件以二进制模式写入：

Then, open a file to write in binary mode:

result = '01111100111010101111010011111010000000011000111000010111110111110010100110010011010100101111100011110001000110101100111101000010101101110110111000111010101110010111111110011000101101000110111000'
with open('test.bin', 'wb') as f:
  f.write(_to_Bytes(result))

现在，将原始字符串写入文件，可以进行字节比较：

Now, writing the original string to a file, a comparison of bytes can take place:

import os
with open('test_compare.txt', 'a') as f:
  f.write('The quick brown fox jumps over the lazy dog')

_o = os.path.getsize('test_compare.txt')
_c = os.path.getsize('test.bin')
print(f'Original file: {_o} bytes')
print(f'Compressed file: {_c} bytes')
print('Compressed file to about {}% of original'.format(round((((_o-_c)/_o)*100), 0)))

输出：

Original file: 43 bytes
Compressed file: 25 bytes
Compressed file to about 42.0% of original

回到原始位置，您可以编写一个确定字符可能顺序的函数：

To get back to the original, you can write a function that determines the possible ordering of characters:

d = {'y': '00', 'x': '1', 'z': '10'}
result = '1111111001010'
from typing import Generator
def reverse_encoding(content:str, _lookup) -> Generator[str, None, None]:
  while content:
    _options = [i for i in _lookup if content.startswith(i) and (any(content[len(i):].startswith(b) for b in _lookup) or not content[len(i):])]
    if not _options:
      raise Exception("Decoding error")
    yield _lookup[_options[0]]
    content = content[len(_options[0]):]

print(''.join(reverse_encoding(result, {b:a for a, b in d.items()})))

输出：

'xxxxxxxyzz'

这篇关于如何使用Python将霍夫曼编码写入文件？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用Python将霍夫曼编码写入文件？ [英] How to write Huffman coding to a file using Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用Python将霍夫曼编码写入文件？ [英] How to write Huffman coding to a file using Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭