实现运行长度编码 [英] Implementing run-length encoding
问题描述
我写了一个程序来执行运行长度编码。
在典型情况下,如果文本
I've written a program to perform run length encoding. In typical scenario if the text is
AAAAAABBCDEEEEGGHJ
运行长度编码将使其
A6B2C1D1E4G2H1J1
,但每个非重复字符都添加了额外的1。因为我压缩BMP文件与它,我去了一个想法,放置一个标记$来表示一个重复的字符的出现(假设图像文件有大量的重复文本)。
but it was adding extra 1 for each non repeating character. Since i'm compressing BMP files with it, i went with an idea of placing a marker "$" to signify the occurance of a repeating character, (assuming that image files have huge amount of repeating text).
所以它看起来像
$A6$B2CD$E4$G2HJ
对于当前示例,它的长度是相同的,但BMP文件有一个显着的差异。现在我的问题是解码。因此,一些BMP文件在原始文件中具有模式 $< char>< num>
ie $ I9
文件,所以在压缩文件中我也会包含相同的文本。 $ I9
,但是解码后会把它作为重复的我重复9次!所以它产生错误的输出。我想知道的是,我可以使用哪个符号来标记重复字符的开始(运行),使其不与原始源冲突。
For the current example it's length is the same, but there's a noticable difference for BMP files. Now my problem is in decoding. It so happens some BMP Files have the pattern $<char><num>
i.e. $I9
in the original file, so in the compressed file also i'd contain the same text. $I9
, however upon decoding it'd treat it as a repeating I which repeats 9 times! So it produces wrong output. What i want to know is which symbol can i use to mark the start of a repeating character (run) so that it doesn't conflict with the original source.
推荐答案
为什么不把原始文件中的 $
编码为 $$
在压缩文件中?
Why don't you encode each $
in the original file as $$
in the compressed file?
和/或使用一些其他字符代替 $
And/or use some other character instead of $
- one that is not used much in bmp files.
另请注意,BMP格式有RLE压缩内置 - 查看这里,靠近页面底部 - 在图像数据和压缩下。
Also note that the BMP format has RLE compression 'built-in' - look here, near the bottom of the page - under "Image Data and Compression".
我不知道你使用你的程序,或者它只是为了学习,但如果你使用官方bmp方法,你的压缩图像在查看之前不需要解压缩。
I don't know what you're using your program for, or if it's just for learning, but if you used the "official" bmp method, your compressed images wouldn't need decompression before viewing.
这篇关于实现运行长度编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!