我可以压缩多少个由40个不同字符组成的400个字符? [英] How much can I compress 400 characters made up of 40 different characters?

查看:96
本文介绍了我可以压缩多少个由40个不同字符组成的400个字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个400个字符串,由40个可能的字符组成.这40个字符是 a-z A-N .在此示例中,我有:

I have a 400 character string made up of a variety of 40 possible characters. The 40 characters are a-z A-N. In this example I have:

<代码> nnnnnnnnnnnnnnnnnnnnnnnnunnnnnnnnnnnnuuunnnuxunnnnnnnnnnuxddnnnuxduuuuuunnnnuxddnnnuxddddddunnnnuxddnuuuxdddddduuuuuuxddnnnuxduuudducuccuxddnnnuxdunodduiuiiuxddnnnuxduoodducuccuxddnnnuxdddddduiuiiuxddnnnuxdddddducuccuxddnnnuxdddddduiuiiuxddnnnuxduuudducuccuxddnuuuxdunodduiuiiuxddnnnuxduoodducuccuxddnnnuxdddddduuuuuuxddnnnuxxxxxxxunnnnuxxxnnnnuuuuuuuunnnnnuuunnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn

我需要使压缩版本由URL安全字符组成.我正在使用 a-z A-N 0-9-._〜()'!*:@.; 表示原始字符串中的字符最多重复23次.如果一个字符重复超过23次,则该字符和数字表示将添加到先前的压缩字符串上.

I need to have the compressed version be made out of URL-safe characters. I am using a-z A-N and 0-9 - . _ ~ ( ) ' ! * : @ . ; to represent how many times characters in the original string repeat up to 23 times. If a character repeats more than 23 times, the character and numerical representation is added onto the prior compressed string.

压缩后,结果字符串为246个字符:

After compressing, the resulting string is 246 characters:

<代码> N;未u1n1uxun8uxd0n1uxdu4n2uxd0n1uxd4un2uxd0nu1xd4u4xd0n1uxdu1d0ucuc0uxd0n1uxdunod0uiui0uxd0n1uxduo0d0ucuc0uxd0n1uxd4uiui0uxd0n1uxd4ucuc0uxd0n1uxd4uiui0uxd0n1uxdu1d0ucuc0uxd0nu1xdunod0uiui0uxd0n1uxduo0d0ucuc0uxd0n1uxd4u4xd0n1ux5un2ux1n2u6n3u1n; N(

用于压缩原始字符串的Javascript函数是:

The Javascript function to compress the original string is:

// 40 single URL-safe characters 
let singleCharList = [
"a",
"b",
"c",
"d",
"e",
"f",
"g",
"h",
"i",
"j",
"k",
"l",
"m",
"n",
"o",
"p",
"q",
"r",
"s",
"t",
"u",
"v",
"w",
"x",
"y",
"z",
"A",
"B",
"C",
"D",
"E",
"F",
"G",
"H",
"I",
"J",
"K",
"L",
"M",
"N",
]

// 23 single URL-safe characters for counting (0 is actually 1)
let singleCharCountList = [
"0",
"1",
"2",
"3",
"4",
"5",
"6",
"7",
"8",
"9",
"-",
".",
"_",
"~",
"(",
")",
"'",
"!",
"*",
":",
"@",
",",
";",
]

function compressString(stringToCompress) {
  var compressedString = ""
  var lastChar = ""
  var inARowCount = 0
  var maxInARowCount = singleCharCountList.length
  for (i = 0; i < stringToCompress.length + 1; ++i) {
let currentChar = stringToCompress.charAt(i)
if (currentChar === lastChar && inARowCount < maxInARowCount) {
  inARowCount += 1
} else {
  if (inARowCount > 0) {
    singleCharCount = singleCharCountList[inARowCount - 1]
    compressedString = compressedString + lastChar + singleCharCount
  } else {
    compressedString = compressedString + lastChar
  }
  inARowCount = 0
}
lastChar = currentChar
  }
  return compressedString
}

// To inflate the string back to its original 400 characters, I have: 

function inflateString(stringToInflate) {
  var inflatedString = ""
  var lastChar = ""
  for (i = 0; i < stringToInflate.length; ++i) {
let currentChar = stringToInflate.charAt(i)
// currentChar is part of singleCharCountList 
// and lastChar is part of singleCharList
if (singleCharCountList.indexOf(currentChar) > -1 &&
  singleCharList.indexOf(lastChar) > -1) {
  var loop = singleCharCountList.indexOf(currentChar) + 1
  for (var j = 0; j < loop; ++j) {
    inflatedString = inflatedString + lastChar
  }
}
// currentChar is part of singleCharList
// and lastChar is part of singleCharCountList
else if (singleCharList.indexOf(currentChar) > -1 &&
  singleCharCountList.indexOf(lastChar) > -1) {
  inflatedString = inflatedString + currentChar
}
// currentChar is part of singleCharList
// and lastChar is part of singleCharList
else if (singleCharList.indexOf(currentChar) > -1 &&
  singleCharList.indexOf(lastChar) > -1) {
  inflatedString = inflatedString + currentChar
}
// current char is part of singleCharList
// and lastChar is not part of singleCharList or singleCharCountList
else if (singleCharList.indexOf(currentChar) > -1 &&
  singleCharList.indexOf(lastChar) == -1 &&
  singleCharCountList.indexOf(lastChar) == -1) {
  inflatedString = inflatedString + currentChar
}
lastChar = currentChar
  }
  return inflatedString
}


const str = 'nnnnnnnnnnnnnnnnnnnnnnnnunnnnnnnnnnnnuuunnnuxunnnnnnnnnnuxddnnnuxduuuuuunnnnuxddnnnuxddddddunnnnuxddnuuuxdddddduuuuuuxddnnnuxduuudducuccuxddnnnuxdunodduiuiiuxddnnnuxduoodducuccuxddnnnuxdddddduiuiiuxddnnnuxdddddducuccuxddnnnuxdddddduiuiiuxddnnnuxduuudducuccuxddnuuuxdunodduiuiiuxddnnnuxduoodducuccuxddnnnuxdddddduuuuuuxddnnnuxxxxxxxunnnnuxxxnnnnuuuuuuuunnnnnuuunnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn';

const compressed = compressString(str);
console.log(compressed.length);
const inflate = inflateString(compressed)
console.log(str.length,inflateString.length)

我可以做得更好吗?在我的示例中,我对246感到满意,但这取决于原始内容.

Can I do better? I'm moderately happy with 246 in my example, but this can vary a lot depending on the original content.

推荐答案

首先使用标准压缩工具(例如zlib)压缩字符串.您的示例使用zlib从400字节压缩到93.然后将二进制结果编码为75个字符的受限集,这将使其扩展约30%,从而产生约120个字节.仍比原来的400压缩了很多.

First compress the string using a standard compression tool, such as zlib. Your example compresses from 400 bytes down to 93 with zlib. Then encode the binary result into your restricted set of 75 characters, which will expand it by about 30%, resulting in about 120 bytes. Still significantly compressed from the original 400.

可以采用多种方法进行编码,但是最快的方法可能是反向霍夫曼编码,在这种情况下,您可以使用二进制表示序列化的霍夫曼代码序列,其概率分布平均为75个符号.您最终得到53个6位代码和22个7位代码.这样,每个符号为6.29位,接近每个符号最佳的6.23位.

That encoding can be done many ways, but the fastest is likely reverse Huffman coding, where you take the binary to represent a sequence of Huffman codes of a flat probability distribution of 75 symbols. You end up with 53 6-bit codes and 22 7-bit codes. That gives 6.29 bits per symbol, close to the optimal 6.23 bits per symbol.

开发自己的,不合标准的压缩方法而对游程长度编码的尝试很少,这是浪费时间.

It is a waste of time to develop your own, substandard, compression method with a weak attempt at run-length coding.

这篇关于我可以压缩多少个由40个不同字符组成的400个字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆