将字节数据编码为数字 [英] Encoding byte data into digits

查看:111
本文介绍了将字节数据编码为数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编码和解码任意数据有一个常见的方法,所以编码的最终结果只能包含数字,比如base64_encode,但是没有字母?



虚构示例: / p>

  $ encoded = numbers_encode(Mary have a little lamb); 

echo $ encoded; //输出例如12238433742239423742322(虚构结果)

$ decoded = numbers_decode(12238433742239423742322);

echo $ decoding; //输出玛丽有一只小羊羔


解决方案

你可以将(单字节字符)字符串视为基本256编码的数字,其中\x00表示0,(空格,即\x20)表示32,依此类推,直到\xFF ,代表255.



只有数字0-9的表示可以通过将表示更改为基数10来完成。



请注意,base64编码实际上并不是基本转换。 base64将输入分成3个字节(24位)的组,并对这些组进行基本转换。这很好,因为24位的数字可以用64位的四位数表示(2 ^ 24 = 64 ^ 4)。



这或多或少是什么一个href =https://stackoverflow.com/users/226648/el-pescado> el.pescado - 他将输入数据分成8位,然后将数字转换为10位。但是,这种技术相对于base 64编码有一个缺点 - 它不能与字节边界正确对齐。要表示一个8位的数字(0-255,当无符号)时,我们需要三位数字在基数10.然而,最左边的数字比其他数字少。它可以是0,1或2(对于无符号数字)。



基数10中的一个数字存储log(10)/ log(2)位。无论您选择的块大小,您都无法将表示与8位字节对齐(在之前的段落中描述的对齐)。因此,最紧凑的表示形式是基本转换(您可以看到它们只是一个大块的基本编码)。



这是一个例子 bcmath

  bcscale(0); 
函数base256ToBase10(string $ string){
//参数是little-endian
$ result =0; ($ i = strlen($ string)-1; $ i> = 0; $ i--){
$ result = bcadd($ result,
bcmul $ string [$ i]),bcpow(256,$ i)));
}
return $ result;
}
函数base10ToBase256(string $ number){
$ result =;
$ n = $ number;
do {
$馀= bcmod($ n,256);
$ n = bcdiv(​​$ n,256);
$ result。= chr($ remaining);
} while($ n> 0);

return $ result;
}

对于

  $ string =玛丽有一只小羊羔; 
$ base10 = base256ToBase10($ string);
echo $ base10,\\\
;
$ base256 = base10ToBase256($ base10);
echo $ base256;

我们得到

 
36826012939234118013885831603834892771924668323094861
玛丽有一个小羊羔

由于每个数字只能编码 log(10) /log(2)=~3.32193 bits希望数字往往是 140%以上(不要长达200%,就像el.pescado的答案一样)。


Is there a common method to encode and decode arbitrary data so the encoded end result consists of numbers only - like base64_encode but without the letters?

Fictitious example:

$encoded = numbers_encode("Mary had a little lamb");

echo $encoded; // outputs e.g. 12238433742239423742322 (fictitious result)

$decoded = numbers_decode("12238433742239423742322");

echo $decoded; // outputs "Mary had a little lamb"

解决方案

You can think of a (single byte character) string as a base-256 encoded number where "\x00" represents 0, ' ' (space, i.e., "\x20") represents 32 and so on until "\xFF", which represents 255.

A representation only with numbers 0-9 can be accomplished simply by changing the representation to base 10.

Note that "base64 encoding" is not actually a base conversion. base64 breaks the input into groups of 3 bytes (24 bits) and does the base conversion on those groups individually. This works well because a number with 24 bits can be represented with four digits in base 64 (2^24 = 64^4).

This is more or less what el.pescado does – he splits the input data into 8-bit pieces and then converts the number into base 10. However, this technique has one disadvantage relatively to base 64 encoding – it does not align correctly with the byte boundary. To represent a number with 8-bits (0-255 when unsigned) we need three digits in base 10. However, the left-most digit has less information than the others. It can either be 0, 1 or 2 (for unsigned numbers).

A digit in base 10 stores log(10)/log(2) bits. No matter the chunk size you choose, you're never going to be able to align the representations with 8-bit bytes (in the sense of "aligning" I've described in the paragraph before). Consequently, the most compact representation is a base conversion (which you can see as if it were a "base encoding" with only one big chunk).

Here is an example with bcmath.

bcscale(0);
function base256ToBase10(string $string) {
    //argument is little-endian
    $result = "0";
    for ($i = strlen($string)-1; $i >= 0; $i--) {
        $result = bcadd($result,
            bcmul(ord($string[$i]), bcpow(256, $i)));
    }
    return $result;
}
function base10ToBase256(string $number) {
    $result = "";
    $n = $number;
    do {
        $remainder = bcmod($n, 256);
        $n = bcdiv($n, 256);
        $result .= chr($remainder);
    } while ($n > 0);

    return $result;
}

For

$string = "Mary had a little lamb";
$base10 = base256ToBase10($string);
echo $base10,"\n";
$base256 = base10ToBase256($base10);
echo $base256;

we get

36826012939234118013885831603834892771924668323094861
Mary had a little lamb

Since each digit encodes only log(10)/log(2)=~3.32193 bits expect the number to tend to be 140% longer (not 200% longer, as would be with el.pescado's answer).

这篇关于将字节数据编码为数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆