将字节数据编码为数字 [英] Encoding byte data into digits
问题描述
虚构示例: / p>
$ encoded = numbers_encode(Mary have a little lamb);
echo $ encoded; //输出例如12238433742239423742322(虚构结果)
$ decoded = numbers_decode(12238433742239423742322);
echo $ decoding; //输出玛丽有一只小羊羔
你可以将(单字节字符)字符串视为基本256编码的数字,其中\x00表示0,(空格,即\x20)表示32,依此类推,直到\xFF ,代表255.
只有数字0-9的表示可以通过将表示更改为基数10来完成。
请注意,base64编码实际上并不是基本转换。 base64将输入分成3个字节(24位)的组,并对这些组进行基本转换。这很好,因为24位的数字可以用64位的四位数表示(2 ^ 24 = 64 ^ 4)。
这或多或少是什么一个href =https://stackoverflow.com/users/226648/el-pescado> el.pescado - 他将输入数据分成8位,然后将数字转换为10位。但是,这种技术相对于base 64编码有一个缺点 - 它不能与字节边界正确对齐。要表示一个8位的数字(0-255,当无符号)时,我们需要三位数字在基数10.然而,最左边的数字比其他数字少。它可以是0,1或2(对于无符号数字)。
基数10中的一个数字存储log(10)/ log(2)位。无论您选择的块大小,您都无法将表示与8位字节对齐(在之前的段落中描述的对齐)。因此,最紧凑的表示形式是基本转换(您可以看到它们只是一个大块的基本编码)。
这是一个例子 bcmath 。
bcscale(0);
函数base256ToBase10(string $ string){
//参数是little-endian
$ result =0; ($ i = strlen($ string)-1; $ i> = 0; $ i--){
$ result = bcadd($ result,
bcmul $ string [$ i]),bcpow(256,$ i)));
}
return $ result;
}
函数base10ToBase256(string $ number){
$ result =;
$ n = $ number;
do {
$馀= bcmod($ n,256);
$ n = bcdiv($ n,256);
$ result。= chr($ remaining);
} while($ n> 0);
return $ result;
}
对于
$ string =玛丽有一只小羊羔;
$ base10 = base256ToBase10($ string);
echo $ base10,\\\
;
$ base256 = base10ToBase256($ base10);
echo $ base256;
我们得到
36826012939234118013885831603834892771924668323094861
玛丽有一个小羊羔
由于每个数字只能编码 log(10) /log(2)=~3.32193
bits希望数字往往是 140%以上(不要长达200%,就像el.pescado的答案一样)。
Is there a common method to encode and decode arbitrary data so the encoded end result consists of numbers only - like base64_encode but without the letters?
Fictitious example:
$encoded = numbers_encode("Mary had a little lamb");
echo $encoded; // outputs e.g. 12238433742239423742322 (fictitious result)
$decoded = numbers_decode("12238433742239423742322");
echo $decoded; // outputs "Mary had a little lamb"
You can think of a (single byte character) string as a base-256 encoded number where "\x00" represents 0, ' ' (space, i.e., "\x20") represents 32 and so on until "\xFF", which represents 255.
A representation only with numbers 0-9 can be accomplished simply by changing the representation to base 10.
Note that "base64 encoding" is not actually a base conversion. base64 breaks the input into groups of 3 bytes (24 bits) and does the base conversion on those groups individually. This works well because a number with 24 bits can be represented with four digits in base 64 (2^24 = 64^4).
This is more or less what el.pescado does – he splits the input data into 8-bit pieces and then converts the number into base 10. However, this technique has one disadvantage relatively to base 64 encoding – it does not align correctly with the byte boundary. To represent a number with 8-bits (0-255 when unsigned) we need three digits in base 10. However, the left-most digit has less information than the others. It can either be 0, 1 or 2 (for unsigned numbers).
A digit in base 10 stores log(10)/log(2) bits. No matter the chunk size you choose, you're never going to be able to align the representations with 8-bit bytes (in the sense of "aligning" I've described in the paragraph before). Consequently, the most compact representation is a base conversion (which you can see as if it were a "base encoding" with only one big chunk).
Here is an example with bcmath.
bcscale(0);
function base256ToBase10(string $string) {
//argument is little-endian
$result = "0";
for ($i = strlen($string)-1; $i >= 0; $i--) {
$result = bcadd($result,
bcmul(ord($string[$i]), bcpow(256, $i)));
}
return $result;
}
function base10ToBase256(string $number) {
$result = "";
$n = $number;
do {
$remainder = bcmod($n, 256);
$n = bcdiv($n, 256);
$result .= chr($remainder);
} while ($n > 0);
return $result;
}
For
$string = "Mary had a little lamb";
$base10 = base256ToBase10($string);
echo $base10,"\n";
$base256 = base10ToBase256($base10);
echo $base256;
we get
36826012939234118013885831603834892771924668323094861 Mary had a little lamb
Since each digit encodes only log(10)/log(2)=~3.32193
bits expect the number to tend to be 140% longer (not 200% longer, as would be with el.pescado's answer).
这篇关于将字节数据编码为数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!