使用不是2的幂的基数的基本编码效率 [英] Base-encoding efficiency using bases that are not powers of 2

查看:84
本文介绍了使用不是2的幂的基数的基本编码效率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Base64 将三个8位字符编码为四个(基于base-64)6-位字符。 Base64是有效的,因为它使用的底数(64)和指数(4)恰好与2的底数10指数(24)完全匹配:3x8 = 4x6 = 24和2 24 = 64 < sup> 4 = 16777216。

Base64 encodes three 8-bit characters onto onto four (base-64) 6-bit "characters". Base64 is efficient because it uses a base (64) and exponent (4) that happens to perfectly match a base-10 exponent of 2 (24): 3x8=4x6=24 and 224=644=16777216.

似乎没有基数/指数组合产生的值与以10为底的2指数完全匹配(特别是2个 n 表示任何0lt; n < 256),除了base32,base64和base128(以及base4,base8,base16,base256) ,base512等,更难以实际使用)。 请参阅最后一个代码块以获取匹配指数的完整列表!

It appears that there are no base/exponent combinations that result in values that exactly match base-10 exponents of 2 (specifically 2n for any 0<n<256), except for base32, base64 and base128 (along with base4, base8, base16, base256, base512, etc which are more difficult to make practical use of). See the last code block for the full list of matching exponents!

以base92为例,92 2 = 8464,其中2的最接近的10底指数是2 13 =8192。8464-8192 = 272个可在base92端使用的索引(如果我理解正确的话)是无法利用的的。
(2 19 = 524288< 91 3 = 753571< 2 20 = 1048576,但2 20 -91 3 = 295005。272是明显的赢家。)

To use the example of base92, 922 = 8464, with the closest base-10 exponent of 2 being 213 = 8192. 8464-8192 = 272 indices available on the base92 side that (if I understand correctly) are not possible to take advantage of. (219=524288 < 913=753571 < 220=1048576, but 220-913=295005. 272 is the clear winner.)

在今天下午早些时候思考这些有损问题时,我到达了一个有趣的问题。如果我一次查看输入的14位,并且遇到过像 00100000 11111111 这样的输入序列,我可以将其解释为 10000011111111 或8447,并将其编码在8193< n < 8464区域中,否则该区域将无法访问!但是,一旦达到 10000100010001 8465,我就需要退回使用7位编码。这使该方法对输入敏感,并且实际上仅对起始字节为10000100(132,ASCII字母 Z)的两字节序列有用。我宁愿我的编码器对输入不敏感。

While pondering these lossiness problems earlier this afternoon, I arrived at an interesting question. If I look at input 14 bits at a time, and I ever come across an input sequence like 00100000 11111111, I could interpret that as 10000011111111, or 8447, and encode it within the area 8193<n<8464 that is otherwise inaccessible! However, once I reach 10000100010001, 8465, I would need to drop back to using 7-bit encoding. This makes this approach input-sensitive, and in practice would only be useful for two-byte sequences with a starting byte of 10000100 (132, ASCII 'Z'). I'd rather my encoder not be input-sensitive.

我有两个问题:


  1. 我对base- n 编码的理解似乎是正确的吗?除了Wikipedia页面,我还可以去哪里了解它?

  1. Does my understanding of base-n encoding seem to be generally correct? Besides the Wikipedia page, where else can I go to learn about it?

我可以使用任何技巧或技术来提高base- n 压缩除base32 / 64/128以外的任意基?

Are there any tricks or techniques I can use to raise the efficiency of base-n compression for an arbitrary base other than base32/64/128?

 

低于此点的所有内容

这个小型的PHP脚本会计算所有可能的base ^指数(对于0lt; base <513和2 n 的每个可能指数(对于0 < n < 256)。然后,它将两个计算的结果相减,并以升序列出结果。

This small PHP script computes every possible base^exponent (for 0<base<513 and 2<exponent<257, both picked arbitrarily) alongside every possible exponent of 2n (for 0<n<256). It then subtracts the results of the two calculations, and lists the results in ascending order.

它是为Linux编写的,但是如果您将 fprintf()调用(包含一些转义序列)。

It was written for Linux but should run fine on Windows if you comment out the fprintf() calls (which contain some escape sequences).

请注意,此程序将生成30,145行输出:)

<?php

$c = 0;
$op = 0;

for ($i = 3; $i < 513; $i++) {
    for ($j = 2; $j < 257; $j++) {
        $p = ($c * 100) / 33835;
        if ($p > $op + 5 || $p == 100) {
            fprintf(STDERR, "\rgen:  %2.1f%%", $p);
            $op = $p;
        }
        if ($i >= 2 ** 32 || $j >= 2 ** 32 || $i ** $j >= 2 ** 32) continue;
        for ($k = 1; $k < 33; $k++) {
            $x = $i ** $j - 2 ** $k;
            if ($x < 0) continue;
            if (!isset($a[$x])) $a[$x] = [];
            $a[$x][] = [ $i, $j, $k, $i ** $j, 2 ** $k, (float)sprintf("%2.1f", ((2 ** $k) / ($i ** $j)) * 100) ];
            $c++;
        }
    }
}

foreach ($a as $i => $_) {
    if ($i < 0) {
        print "\r64-bit machine required\n";
        die;
    }
}

$q = $op = $c = 0;
foreach ($a as $i => $_) {
    $p = ($c * 100) / 28013;
    if ($p > $op + 5 || $p == 100) {
        fprintf(STDERR, "\rsort: %2.1f%%", $p);
        $op = $p;
    }
    $c++;
    uasort($a[$i], function($a, $b) {
        return $a[0] < $b[0];
    });
}

fprintf(STDERR, "\rsort root\e[K");
uksort($a, function($a, $b) {
    return $a > $b;
});

fprintf(STDERR, "\r\e[K");
$l = 0;
foreach ($a as $i => $z) {
    foreach ($z as $x) {
        print
            sprintf("%5d %5.1f%%", $l++, $x[5])
            .' ('.$x[0].'^'.$x[1].'='.sprintf("%.0f", $x[0] ** $x[1]).')'
            .'-(2^'.$x[2].'='.(2 ** $x[2]).')'
            .'='.$i
            ."\n";

    }
}






正如在该问题开头的第二段末尾所指出的,这是与0完全匹配2 n 的指数组合的完整列表; n < 256。


As noted at the end of the second paragraph at the start of this question, here is the full list of exponent combinations that perfectly match 2n for 0<n<256.

格式为:行号; (2 ^ n)/(基本^指数)以百分比表示; (base ^ exponent = result)-(2 ^ exponent)= distance。

Format is: line number; (2^n) / (base^exponent) represented as a percentage; (base^exponent=result)-(2^exponent)=distance.

请注意第9行的base64。Base92位于上述程序的完整输出的第200行。

Note base64 on line 9. Base92 is on line 200 of the full output of the program above.

    0 100.0% (512^3=134217728)-(2^27=134217728)=0
    1 100.0% (512^2=262144)-(2^18=262144)=0
    2 100.0% (256^3=16777216)-(2^24=16777216)=0
    3 100.0% (256^2=65536)-(2^16=65536)=0
    4 100.0% (128^4=268435456)-(2^28=268435456)=0
    5 100.0% (128^3=2097152)-(2^21=2097152)=0
    6 100.0% (128^2=16384)-(2^14=16384)=0
    7 100.0% (64^3=262144)-(2^18=262144)=0
    8 100.0% (64^2=4096)-(2^12=4096)=0
    9 100.0% (64^4=16777216)-(2^24=16777216)=0
   10 100.0% (64^5=1073741824)-(2^30=1073741824)=0
   11 100.0% (32^6=1073741824)-(2^30=1073741824)=0
   12 100.0% (32^5=33554432)-(2^25=33554432)=0
   13 100.0% (32^4=1048576)-(2^20=1048576)=0
   14 100.0% (32^3=32768)-(2^15=32768)=0
   15 100.0% (32^2=1024)-(2^10=1024)=0
   16 100.0% (16^2=256)-(2^8=256)=0
   17 100.0% (16^7=268435456)-(2^28=268435456)=0
   18 100.0% (16^6=16777216)-(2^24=16777216)=0
   19 100.0% (16^5=1048576)-(2^20=1048576)=0
   20 100.0% (16^4=65536)-(2^16=65536)=0
   21 100.0% (16^3=4096)-(2^12=4096)=0
   22 100.0% (8^10=1073741824)-(2^30=1073741824)=0
   23 100.0% (8^8=16777216)-(2^24=16777216)=0
   24 100.0% (8^7=2097152)-(2^21=2097152)=0
   25 100.0% (8^6=262144)-(2^18=262144)=0
   26 100.0% (8^5=32768)-(2^15=32768)=0
   27 100.0% (8^4=4096)-(2^12=4096)=0
   28 100.0% (8^3=512)-(2^9=512)=0
   29 100.0% (8^2=64)-(2^6=64)=0
   30 100.0% (8^9=134217728)-(2^27=134217728)=0
   31 100.0% (4^3=64)-(2^6=64)=0
   32 100.0% (4^8=65536)-(2^16=65536)=0
   33 100.0% (4^4=256)-(2^8=256)=0
   34 100.0% (4^5=1024)-(2^10=1024)=0
   35 100.0% (4^6=4096)-(2^12=4096)=0
   36 100.0% (4^7=16384)-(2^14=16384)=0
   37 100.0% (4^13=67108864)-(2^26=67108864)=0
   38 100.0% (4^9=262144)-(2^18=262144)=0
   39 100.0% (4^10=1048576)-(2^20=1048576)=0
   40 100.0% (4^11=4194304)-(2^22=4194304)=0
   41 100.0% (4^12=16777216)-(2^24=16777216)=0
   42 100.0% (4^14=268435456)-(2^28=268435456)=0
   43 100.0% (4^15=1073741824)-(2^30=1073741824)=0
   44 100.0% (4^2=16)-(2^4=16)=0


推荐答案

base128无效,因为您必须使用大于'128'的字符巫婆代码。对于> = 128 chrome的字符女巫代码,请发送两个字节...(因此,字符串女巫在发送时将其中的1MB字符更改为2MB字节...因此您失去了所有收益)。对于base64字符串,这种现象不会出现(因此,我们只松动了约33%)。 此处位于更新部分

base128 is not effective because you must use characters witch codes greater than '128'. For charater witch codes >=128 chrome send two bytes... (so string witch 1MB of this characters on sending will be change to 2MB bytes... so you loose all profit). For base64 strings this phenomena does't appear (so we loose only ~33%). More details here in "update" section.

这篇关于使用不是2的幂的基数的基本编码效率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆