基于字符串的数据编码:Base64 vs Base64url [英] String based data encoding: Base64 vs Base64url

查看:185
本文介绍了基于字符串的数据编码:Base64 vs Base64url的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在诸如JSON Web令牌之类的东西中看到的Base64和Base64url有什么区别?

解决方案

Base64和Base64url都是以字符串形式编码二进制数据的方法.您可以在此处了解有关base64的理论. Base64的问题在于它包含字符+/=,它们在某些文件系统名称和URL中具有保留的含义.因此,base64url通过将+替换为-并将/替换为_来解决此问题.不需要时可以省略尾随填充字符=,但是在URL中,很有可能是% URL编码.这样,编码后的数据就可以毫无问题地包含在URL中了.

以下是差异的图表:

Index  Base64  Base64Url

0      A       A 
1      B       B 
2      C       C 
3      D       D 
4      E       E 
5      F       F 
6      G       G 
7      H       H 
8      I       I 
9      J       J 
10     K       K 
11     L       L 
12     M       M 
13     N       N 
14     O       O 
15     P       P 
16     Q       Q 
17     R       R 
18     S       S 
19     T       T 
20     U       U 
21     V       V 
22     W       W 
23     X       X 
24     Y       Y 
25     Z       Z 
26     a       a 
27     b       b 
28     c       c 
29     d       d 
30     e       e 
31     f       f 
32     g       g 
33     h       h 
34     i       i 
35     j       j 
36     k       k 
37     l       l 
38     m       m 
39     n       n 
40     o       o 
41     p       p 
42     q       q 
43     r       r 
44     s       s 
45     t       t 
46     u       u 
47     v       v 
48     w       w
49     x       x
50     y       y
51     z       z
52     0       0
53     1       1
54     2       2
55     3       3
56     4       4
57     5       5
58     6       6
59     7       7
60     8       8
61     9       9
62     +       -
63     /       _

       =       (optional)

下面,我将引用标准中的定义.

RCF 4648规范

4. Base 64编码

以下对base 64的描述源自 3 ,[4 ],[5], 和[6].这种编码可以称为"base64".

Base 64编码旨在表示任意序列 八位位组的形式允许同时使用上限和上限 小写字母,但不必让人可读.

使用了US-ASCII的65个字符的子集,可以将6位作为
每个可打印字符表示. (额外的第65个字符"=, 用于表示特殊的处理功能.)

编码过程将输入位的24位组表示为 输出4个编码字符的字符串.从左到右 右边,通过串联3个8位组成一个24位输入组 输入组.然后将这24位视为4个串联 6位组,每个组都翻译成一个字符 以64位字母为底.

每个6位组都用作64个可打印数组的索引 人物.索引引用的字符放置在
输出字符串.

                  Table 1: The Base 64 Alphabet

 Value Encoding  Value Encoding  Value Encoding  Value Encoding
     0 A            17 R            34 i            51 z
     1 B            18 S            35 j            52 0
     2 C            19 T            36 k            53 1
     3 D            20 U            37 l            54 2
     4 E            21 V            38 m            55 3
     5 F            22 W            39 n            56 4
     6 G            23 X            40 o            57 5
     7 H            24 Y            41 p            58 6
     8 I            25 Z            42 q            59 7
     9 J            26 a            43 r            60 8
    10 K            27 b            44 s            61 9
    11 L            28 c            45 t            62 +
    12 M            29 d            46 u            63 /
    13 N            30 e            47 v
    14 O            31 f            48 w         (pad) =
    15 P            32 g            49 x
    16 Q            33 h            50 y

如果少于24位可用,则执行特殊处理 在要编码的数据的末尾.完整的编码范围是
总是在数量结束时完成.当输入少于24个时
输入组中的位可用,值为零的位被添加
(在右侧)以形成整数个6位组.填充
数据末尾使用'='字符执行.从
所有以64为基数的输入都是整数的八位位组,只有以下 可能会出现以下情况:

(1)编码输入的最终量是的整数倍 24 位;在这里,编码输出的最终单位将是一个整数 4个字符的倍数,没有"="填充.

(2)编码输入的最终量恰好是8位;这里, 这 编码输出的最终单位将是两个字符,后跟 两个"="填充字符.

(3)编码输入的最终量恰好是16位;这里, 这 编码输出的最终单位将是三个字符,后跟 一个"="填充字符.

5.使用URL和文件名安全字母进行Base 64编码

带有URL和文件名安全字母的Base 64编码具有 在[12]中使用过.

已建议使用〜"作为替代字母 第63个字符.由于〜"字符在 一些文件系统环境中,本节中描述的编码 建议使用此部分.剩余的未保留URI 字符为.",但某些文件系统环境不允许 多 "."在文件名中,因此为."特点 也没有吸引力.

填充字符"="通常用于百分号编码, URI [9],但如果隐式知道数据长度,则可以为
通过跳过填充来避免;请参阅第3.2节.

此编码可以称为"base64url".此编码
不应被视为与"base64"编码相同,并且
不应仅被称为"base64".除非明确说明
否则,"base64"是指上一节中的base 64.

此编码在技术上与上一个相同,除了 表62中所示的62:nd和63:rd字母字符.

     Table 2: The "URL and Filename safe" Base 64 Alphabet

 Value Encoding  Value Encoding  Value Encoding  Value Encoding
     0 A            17 R            34 i            51 z
     1 B            18 S            35 j            52 0
     2 C            19 T            36 k            53 1
     3 D            20 U            37 l            54 2
     4 E            21 V            38 m            55 3
     5 F            22 W            39 n            56 4
     6 G            23 X            40 o            57 5
     7 H            24 Y            41 p            58 6
     8 I            25 Z            42 q            59 7
     9 J            26 a            43 r            60 8
    10 K            27 b            44 s            61 9
    11 L            28 c            45 t            62 - (minus)
    12 M            29 d            46 u            63 _
    13 N            30 e            47 v           (underline)
    14 O            31 f            48 w
    15 P            32 g            49 x
    16 Q            33 h            50 y         (pad) =

What is the difference between Base64 and Base64url that I see in things like JSON web tokens?

解决方案

Both Base64 and Base64url are ways to encode binary data in string form. You can read about the theory of base64 here. The problem with Base64 is that it contains the characters +, /, and =, which have a reserved meaning in some filesystem names and URLs. So base64url solves this by replacing + with - and / with _. The trailing padding character = can be eliminated when not needed, but in a URL it would instead most likely be % URL encoded. Then the encoded data can be included in a URL without problems.

Here is a chart of the differences:

Index  Base64  Base64Url

0      A       A 
1      B       B 
2      C       C 
3      D       D 
4      E       E 
5      F       F 
6      G       G 
7      H       H 
8      I       I 
9      J       J 
10     K       K 
11     L       L 
12     M       M 
13     N       N 
14     O       O 
15     P       P 
16     Q       Q 
17     R       R 
18     S       S 
19     T       T 
20     U       U 
21     V       V 
22     W       W 
23     X       X 
24     Y       Y 
25     Z       Z 
26     a       a 
27     b       b 
28     c       c 
29     d       d 
30     e       e 
31     f       f 
32     g       g 
33     h       h 
34     i       i 
35     j       j 
36     k       k 
37     l       l 
38     m       m 
39     n       n 
40     o       o 
41     p       p 
42     q       q 
43     r       r 
44     s       s 
45     t       t 
46     u       u 
47     v       v 
48     w       w
49     x       x
50     y       y
51     z       z
52     0       0
53     1       1
54     2       2
55     3       3
56     4       4
57     5       5
58     6       6
59     7       7
60     8       8
61     9       9
62     +       -
63     /       _

       =       (optional)

Below I will quote the definitions from the standards.

RCF 4648 specs

4. Base 64 Encoding

The following description of base 64 is derived from 3, [4], [5], and [6]. This encoding may be referred to as "base64".

The Base 64 encoding is designed to represent arbitrary sequences of octets in a form that allows the use of both upper- and lowercase letters but that need not be human readable.

A 65-character subset of US-ASCII is used, enabling 6 bits to be
represented per printable character. (The extra 65th character, "=", is used to signify a special processing function.)

The encoding process represents 24-bit groups of input bits as output strings of 4 encoded characters. Proceeding from left to right, a 24-bit input group is formed by concatenating 3 8-bit input groups. These 24 bits are then treated as 4 concatenated 6-bit groups, each of which is translated into a single character in the base 64 alphabet.

Each 6-bit group is used as an index into an array of 64 printable characters. The character referenced by the index is placed in the
output string.

                  Table 1: The Base 64 Alphabet

 Value Encoding  Value Encoding  Value Encoding  Value Encoding
     0 A            17 R            34 i            51 z
     1 B            18 S            35 j            52 0
     2 C            19 T            36 k            53 1
     3 D            20 U            37 l            54 2
     4 E            21 V            38 m            55 3
     5 F            22 W            39 n            56 4
     6 G            23 X            40 o            57 5
     7 H            24 Y            41 p            58 6
     8 I            25 Z            42 q            59 7
     9 J            26 a            43 r            60 8
    10 K            27 b            44 s            61 9
    11 L            28 c            45 t            62 +
    12 M            29 d            46 u            63 /
    13 N            30 e            47 v
    14 O            31 f            48 w         (pad) =
    15 P            32 g            49 x
    16 Q            33 h            50 y

Special processing is performed if fewer than 24 bits are available at the end of the data being encoded. A full encoding quantum is
always completed at the end of a quantity. When fewer than 24 input
bits are available in an input group, bits with value zero are added
(on the right) to form an integral number of 6-bit groups. Padding
at the end of the data is performed using the '=' character. Since
all base 64 input is an integral number of octets, only the following cases can arise:

(1) The final quantum of encoding input is an integral multiple of 24 bits; here, the final unit of encoded output will be an integral multiple of 4 characters with no "=" padding.

(2) The final quantum of encoding input is exactly 8 bits; here, the final unit of encoded output will be two characters followed by two "=" padding characters.

(3) The final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output will be three characters followed by one "=" padding character.

5. Base 64 Encoding with URL and Filename Safe Alphabet

The Base 64 encoding with an URL and filename safe alphabet has been used in [12].

An alternative alphabet has been suggested that would use "~" as the 63rd character. Since the "~" character has special meaning in some file system environments, the encoding described in this section is recommended instead. The remaining unreserved URI character is ".", but some file system environments do not permit multiple "." in a filename, thus making the "." character unattractive as well.

The pad character "=" is typically percent-encoded when used in an URI [9], but if the data length is known implicitly, this can be
avoided by skipping the padding; see section 3.2.

This encoding may be referred to as "base64url". This encoding
should not be regarded as the same as the "base64" encoding and
should not be referred to as only "base64". Unless clarified
otherwise, "base64" refers to the base 64 in the previous section.

This encoding is technically identical to the previous one, except for the 62:nd and 63:rd alphabet character, as indicated in Table 2.

     Table 2: The "URL and Filename safe" Base 64 Alphabet

 Value Encoding  Value Encoding  Value Encoding  Value Encoding
     0 A            17 R            34 i            51 z
     1 B            18 S            35 j            52 0
     2 C            19 T            36 k            53 1
     3 D            20 U            37 l            54 2
     4 E            21 V            38 m            55 3
     5 F            22 W            39 n            56 4
     6 G            23 X            40 o            57 5
     7 H            24 Y            41 p            58 6
     8 I            25 Z            42 q            59 7
     9 J            26 a            43 r            60 8
    10 K            27 b            44 s            61 9
    11 L            28 c            45 t            62 - (minus)
    12 M            29 d            46 u            63 _
    13 N            30 e            47 v           (underline)
    14 O            31 f            48 w
    15 P            32 g            49 x
    16 Q            33 h            50 y         (pad) =

这篇关于基于字符串的数据编码:Base64 vs Base64url的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆