在大端机器上 UTF-8 的字节顺序是不是和小端机器上的不同?那么为什么 UTF-8 不需要 BOM 呢? [英] Isn’t on big endian machines UTF-8's byte order different than on little endian machines? So why then doesn’t UTF-8 require a BOM?

查看：25 发布时间：2021/12/26 13:55:09 unicode utf-8

本文介绍了在大端机器上 UTF-8 的字节顺序是不是和小端机器上的不同?那么为什么 UTF-8 不需要 BOM 呢?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

UTF-8 可以包含 BOM.然而，它没有区别字节流的字节序.UTF-8始终具有相同的字节顺序.

UTF-8 can contain a BOM. However, it makes no difference as to the endianness of the byte stream. UTF-8 always has the same byte order.

如果 Utf-8 将所有代码点存储在一个字节中，那么为什么字节顺序不起作用以及为什么 BOM 是有意义的需要.但是由于代码点 128 及以上使用 2、3 和最多 6 个字节存储，这意味着它们在 big endian 机器上的字节顺序与在 little endian 机器上的字节顺序不同，所以我们如何声明 Utf-8 总是有相同的字节顺序?

If Utf-8 stored all code-points in a single byte, then it would make sense why endianness doesn’t play any role and thus why BOM isn’t required. But since code points 128 and above are stored using 2, 3 and up to 6 bytes, which means their byte order on big endian machines is different than on little endian machines, so how can we claim Utf-8 always has the same byte order?

谢谢

UTF-8 是面向字节的

UTF-8 is byte oriented

我知道如果两字节 UTF-8 字符 C 由字节 B1 和 B2 组成(其中 B1> 是第一个字节，B2 是最后一个字节)，然后使用 UTF-8 这两个字节总是以相同的顺序写入(因此，如果将此字符写入文件在小端机器 LEM 上，B1 将在最前面，B2 在最后.类似地，如果将 C 写入一个大端机器上的文件 BEM，B1 仍然是第一个，B2 仍然是最后一个.

I understand that if two byte UTF-8 character C consists of bytes B1 and B2 ( where B1 is first byte and B2 is last byte ), then with UTF-8 those two bytes are always written in the same order ( thus if this character is written to a file on little endian machine LEM, B1 will be first and B2 last. Similarly, if C is written to a file on big endian machine BEM, B1 will still be first and B2 still last).

但是当C被写入LEM上的文件F时会发生什么，但是我们将F复制到BEM 并尝试在那里阅读?由于 BEM 自动交换字节( B1 现在是最后一个，B2 第一个字节)，应用程序(在 BEM ) 阅读 F 知道 F 是否是在 BEM 上创建的，因此两个字节的顺序没有交换或者 F 是否从 转移>LEM，在哪种情况下BEM 会自动交换字节?

But what happens when C is written to file F on LEM, but we copy F to BEM and try to read it there? Since BEM automatically swaps bytes ( B1 is now last and B2 first byte ), how will app ( running on BEM ) reading F know whether F was created on BEM and thus order of two bytes wasn’t swapped or whether F was transferred from LEM, in which case BEM automatically swapped the bytes?

我希望这个问题有意义

编辑 2:

回应你的big-endian如果你问，机器不会交换字节他们一次读取一个字节.

In response to your edit: big-endian machines do not swap bytes if you ask them to read a byte at a time.

a) 哦，所以即使字符 C 是 2 个字节长，应用程序(驻留在 BEM 上)读取 F 将读入内存一次只有一个字节(因此它会首先读入内存 B1 然后才B2 )

a) Oh, so even though character C is 2 bytes longs, app ( residing on BEM ) reading F will read into memory just one byte at the time ( thus it will first read into memory B1 and only then B2 )

在 UTF-8 中，您决定如何处理字节基于其高位

In UTF-8, you decide what to do with a byte based on its high-order bits

假设文件F有两个后续字符C和C1(其中C由字节组成B1 和 B2 而 C1 有字节 B3, B4 和 B5> ).应用读取 F 如何通过检查每个字节的高位(例如，它如何确定 B1 和 B2 加在一起应该代表一个字符而不是 B1,*B2* 和 B3)?

Assuming file F has two consequent characters C and C1 ( where C consists of bytes B1 and B2 while C1 has bytes B3, B4 and B5 ). How will app reading F know which bytes belong together simply by checking each byte's high-order bits ( for example, how will it figure out that B1 and B2 taken together should represent a character and not B1,*B2* and B3)?

如果你相信你看到了不同的东西，请编辑你的问题并包括

If you believe that you're seeing something different, please edit your question and include

我不是这么说的.我根本不明白这是怎么回事

I’m not saying that. I simply didn’t understand what was going on

c) 为什么 Utf-16 和 Utf-32 不是面向字节的?

c)Why aren't Utf-16 and Utf-32 also byte oriented?

在大端机器上 UTF-8 的字节顺序是不是和小端机器上的不同?那么为什么 UTF-8 不需要 BOM 呢? [英] Isn’t on big endian machines UTF-8's byte order different than on little endian machines? So why then doesn’t UTF-8 require a BOM?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在大端机器上 UTF-8 的字节顺序是不是和小端机器上的不同?那么为什么 UTF-8 不需要 BOM 呢? [英] Isn’t on big endian machines UTF-8&#39;s byte order different than on little endian machines? So why then doesn’t UTF-8 require a BOM?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

在大端机器上 UTF-8 的字节顺序是不是和小端机器上的不同?那么为什么 UTF-8 不需要 BOM 呢? [英] Isn’t on big endian machines UTF-8's byte order different than on little endian machines? So why then doesn’t UTF-8 require a BOM?

登录关闭