使用不是2的幂的基数的基本编码效率 [英] Base-encoding efficiency using bases that are not powers of 2

查看：84 发布时间：2020/6/3 21:03:52 algorithm bit-manipulation base

本文介绍了使用不是2的幂的基数的基本编码效率的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Base64 将三个8位字符编码为四个（基于base-64）6-位字符。 Base64是有效的，因为它使用的底数（64）和指数（4）恰好与2的底数10指数（24）完全匹配：3x8 = 4x6 = 24和2 ²⁴ = 64 < sup> 4 = 16777216。

Base64 encodes three 8-bit characters onto onto four (base-64) 6-bit "characters". Base64 is efficient because it uses a base (64) and exponent (4) that happens to perfectly match a base-10 exponent of 2 (24): 3x8=4x6=24 and 2²⁴=64⁴=16777216.

似乎没有基数/指数组合产生的值与以10为底的2指数完全匹配（特别是2个ⁿ表示任何0lt; n < 256），除了base32，base64和base128（以及base4，base8，base16，base256），base512等，更难以实际使用）。 请参阅最后一个代码块以获取匹配指数的完整列表！

It appears that there are no base/exponent combinations that result in values that exactly match base-10 exponents of 2 (specifically 2ⁿ for any 0<n<256), except for base32, base64 and base128 (along with base4, base8, base16, base256, base512, etc which are more difficult to make practical use of). See the last code block for the full list of matching exponents!

以base92为例，92 ² = 8464，其中2的最接近的10底指数是2 ¹³ =8192。8464-8192 = 272个可在base92端使用的索引（如果我理解正确的话）是无法利用的的。
（2 ¹⁹ = 524288< 91 ³ = 753571< 2 ²⁰ = 1048576，但2 ²⁰ -91 ³ = 295005。272是明显的赢家。）

To use the example of base92, 92² = 8464, with the closest base-10 exponent of 2 being 2¹³ = 8192. 8464-8192 = 272 indices available on the base92 side that (if I understand correctly) are not possible to take advantage of. (2¹⁹=524288 < 91³=753571 < 2²⁰=1048576, but 2²⁰-91³=295005. 272 is the clear winner.)

在今天下午早些时候思考这些有损问题时，我到达了一个有趣的问题。如果我一次查看输入的14位，并且遇到过像 00100000 11111111 这样的输入序列，我可以将其解释为 10000011111111 或8447，并将其编码在8193< n < 8464区域中，否则该区域将无法访问！但是，一旦达到 10000100010001 8465，我就需要退回使用7位编码。这使该方法对输入敏感，并且实际上仅对起始字节为10000100（132，ASCII字母 Z）的两字节序列有用。我宁愿我的编码器对输入不敏感。

While pondering these lossiness problems earlier this afternoon, I arrived at an interesting question. If I look at input 14 bits at a time, and I ever come across an input sequence like 00100000 11111111, I could interpret that as 10000011111111, or 8447, and encode it within the area 8193<n<8464 that is otherwise inaccessible! However, once I reach 10000100010001, 8465, I would need to drop back to using 7-bit encoding. This makes this approach input-sensitive, and in practice would only be useful for two-byte sequences with a starting byte of 10000100 (132, ASCII 'Z'). I'd rather my encoder not be input-sensitive.

我有两个问题：

我对base- n 编码的理解似乎是正确的吗？除了Wikipedia页面，我还可以去哪里了解它？

Does my understanding of base-n encoding seem to be generally correct? Besides the Wikipedia page, where else can I go to learn about it?

我可以使用任何技巧或技术来提高base- n 压缩除base32 / 64/128以外的任意基？

Are there any tricks or techniques I can use to raise the efficiency of base-n compression for an arbitrary base other than base32/64/128?

低于此点的所有内容

这个小型的PHP脚本会计算所有可能的base ^指数（对于0lt; base <513和2 n 的每个可能指数（对于0 < n < 256）。然后，它将两个计算的结果相减，并以升序列出结果。

This small PHP script computes every possible base^exponent (for 0<base<513 and 2<exponent<257, both picked arbitrarily) alongside every possible exponent of 2ⁿ (for 0<n<256). It then subtracts the results of the two calculations, and lists the results in ascending order.

它是为Linux编写的，但是如果您将 fprintf（）调用（包含一些转义序列）。

It was written for Linux but should run fine on Windows if you comment out the fprintf() calls (which contain some escape sequences).

请注意，此程序将生成30,145行输出：）

<?php

$c = 0;
$op = 0;

for ($i = 3; $i < 513; $i++) {
    for ($j = 2; $j < 257; $j++) {
        $p = ($c * 100) / 33835;
        if ($p > $op + 5 || $p == 100) {
            fprintf(STDERR, "\rgen:  %2.1f%%", $p);
            $op = $p;
        }
        if ($i >= 2 ** 32 || $j >= 2 ** 32 || $i ** $j >= 2 ** 32) continue;
        for ($k = 1; $k < 33; $k++) {
            $x = $i ** $j - 2 ** $k;
            if ($x < 0) continue;
            if (!isset($a[$x])) $a[$x] = [];
            $a[$x][] = [ $i, $j, $k, $i ** $j, 2 ** $k, (float)sprintf("%2.1f", ((2 ** $k) / ($i ** $j)) * 100) ];
            $c++;
        }
    }
}

foreach ($a as $i => $_) {
    if ($i < 0) {
        print "\r64-bit machine required\n";
        die;
    }
}

$q = $op = $c = 0;
foreach ($a as $i => $_) {
    $p = ($c * 100) / 28013;
    if ($p > $op + 5 || $p == 100) {
        fprintf(STDERR, "\rsort: %2.1f%%", $p);
        $op = $p;
    }
    $c++;
    uasort($a[$i], function($a, $b) {
        return $a[0] < $b[0];
    });
}

fprintf(STDERR, "\rsort root\e[K");
uksort($a, function($a, $b) {
    return $a > $b;
});

fprintf(STDERR, "\r\e[K");
$l = 0;
foreach ($a as $i => $z) {
    foreach ($z as $x) {
        print
            sprintf("%5d %5.1f%%", $l++, $x[5])
            .' ('.$x[0].'^'.$x[1].'='.sprintf("%.0f", $x[0] ** $x[1]).')'
            .'-(2^'.$x[2].'='.(2 ** $x[2]).')'
            .'='.$i
            ."\n";

    }
}

正如在该问题开头的第二段末尾所指出的，这是与0完全匹配2 ⁿ的指数组合的完整列表； n < 256。

As noted at the end of the second paragraph at the start of this question, here is the full list of exponent combinations that perfectly match 2ⁿ for 0<n<256.

格式为：行号；（2 ^ n）/（基本^指数）以百分比表示；（base ^ exponent = result）-（2 ^ exponent）= distance。

Format is: line number; (2^n) / (base^exponent) represented as a percentage; (base^exponent=result)-(2^exponent)=distance.

请注意第9行的base64。Base92位于上述程序的完整输出的第200行。

Note base64 on line 9. Base92 is on line 200 of the full output of the program above.

    0 100.0% (512^3=134217728)-(2^27=134217728)=0
    1 100.0% (512^2=262144)-(2^18=262144)=0
    2 100.0% (256^3=16777216)-(2^24=16777216)=0
    3 100.0% (256^2=65536)-(2^16=65536)=0
    4 100.0% (128^4=268435456)-(2^28=268435456)=0
    5 100.0% (128^3=2097152)-(2^21=2097152)=0
    6 100.0% (128^2=16384)-(2^14=16384)=0
    7 100.0% (64^3=262144)-(2^18=262144)=0
    8 100.0% (64^2=4096)-(2^12=4096)=0
    9 100.0% (64^4=16777216)-(2^24=16777216)=0
   10 100.0% (64^5=1073741824)-(2^30=1073741824)=0
   11 100.0% (32^6=1073741824)-(2^30=1073741824)=0
   12 100.0% (32^5=33554432)-(2^25=33554432)=0
   13 100.0% (32^4=1048576)-(2^20=1048576)=0
   14 100.0% (32^3=32768)-(2^15=32768)=0
   15 100.0% (32^2=1024)-(2^10=1024)=0
   16 100.0% (16^2=256)-(2^8=256)=0
   17 100.0% (16^7=268435456)-(2^28=268435456)=0
   18 100.0% (16^6=16777216)-(2^24=16777216)=0
   19 100.0% (16^5=1048576)-(2^20=1048576)=0
   20 100.0% (16^4=65536)-(2^16=65536)=0
   21 100.0% (16^3=4096)-(2^12=4096)=0
   22 100.0% (8^10=1073741824)-(2^30=1073741824)=0
   23 100.0% (8^8=16777216)-(2^24=16777216)=0
   24 100.0% (8^7=2097152)-(2^21=2097152)=0
   25 100.0% (8^6=262144)-(2^18=262144)=0
   26 100.0% (8^5=32768)-(2^15=32768)=0
   27 100.0% (8^4=4096)-(2^12=4096)=0
   28 100.0% (8^3=512)-(2^9=512)=0
   29 100.0% (8^2=64)-(2^6=64)=0
   30 100.0% (8^9=134217728)-(2^27=134217728)=0
   31 100.0% (4^3=64)-(2^6=64)=0
   32 100.0% (4^8=65536)-(2^16=65536)=0
   33 100.0% (4^4=256)-(2^8=256)=0
   34 100.0% (4^5=1024)-(2^10=1024)=0
   35 100.0% (4^6=4096)-(2^12=4096)=0
   36 100.0% (4^7=16384)-(2^14=16384)=0
   37 100.0% (4^13=67108864)-(2^26=67108864)=0
   38 100.0% (4^9=262144)-(2^18=262144)=0
   39 100.0% (4^10=1048576)-(2^20=1048576)=0
   40 100.0% (4^11=4194304)-(2^22=4194304)=0
   41 100.0% (4^12=16777216)-(2^24=16777216)=0
   42 100.0% (4^14=268435456)-(2^28=268435456)=0
   43 100.0% (4^15=1073741824)-(2^30=1073741824)=0
   44 100.0% (4^2=16)-(2^4=16)=0

使用不是2的幂的基数的基本编码效率 [英] Base-encoding efficiency using bases that are not powers of 2

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用不是2的幂的基数的基本编码效率 [英] Base-encoding efficiency using bases that are not powers of 2

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭