ED A0 80 ED B0 80是一个有效的UTF-8字节序列吗？ [英] Is ED A0 80 ED B0 80 a valid UTF-8 byte sequence?

查看：190 发布时间：2018/12/28 21:22:26 java language-agnostic unicode utf-8

本文介绍了ED A0 80 ED B0 80是一个有效的UTF-8字节序列吗？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

java.nio.charset.Charset.forName（utf8）。decode 解码字节序列

 ED A0 80 ED B0 80

进入Unicode代码点：

into the Unicode codepoint:

 U+10000

java。 nio.charset.Charset.forName（utf8）。decode 还解码字节序列

 F0 90 80 80

进入Unicode代码点：

into the Unicode codepoint:

 U+10000

这是由以下代码。

现在这似乎在告诉我UTF-8编码方案将解码 ED A0 80 ED B0 80 和 F0 90 80 80 进入相同的unicode代码点。

Now this seems to be telling me that the UTF-8 encoding scheme will decode ED A0 80 ED B0 80 and F0 90 80 80 into the same unicode codepoint.

但是，如果我访问 https：// www .google.com / search？query = ％ED％A0％80％ED％B0％80 ，

However, if I visit https://www.google.com/search?query=%ED%A0%80%ED%B0%80,

我可以请注意它与 https：// www。页面明显不同。 google.com/search?query= ％F0％90％80％80

I can see that it is clearly different from the page https://www.google.com/search?query=%F0%90%80%80

由于Google搜索使用的是UTF- 8编码方案（如果我错了也纠正我），

Since the Google Search is using UTF-8 encoding scheme (correct me if I'm wrong) as well,

这表明UTF-8无法解码 ED A0 80 ED B0 80 和 F0 90 80 80 进入相同的unicode代码点。

This suggests that the UTF-8 does not decode ED A0 80 ED B0 80 and F0 90 80 80 into the same unicode codepoint(s).

所以基本上我想知道，按照官方标准，UTF-8解码 ED A0 80 ED B0 80 字节序列到Unicode代码点U +10000？

So basically I was wondering, by the official standard, should UTF-8 decode ED A0 80 ED B0 80 byte sequence into the Unicode codepoint U+10000 ?

Co de ：

public class Test {

    public static void main(String args[]) {
        java.nio.ByteBuffer bb = java.nio.ByteBuffer.wrap(new byte[] { (byte) 0xED, (byte) 0xA0, (byte) 0x80, (byte) 0xED, (byte) 0xB0, (byte) 0x80 });
        java.nio.CharBuffer cb = java.nio.charset.Charset.forName("utf8").decode(bb);
        for (int x = 0, xx = cb.limit(); x < xx; ++x) {
            System.out.println(Integer.toHexString(cb.get(x)));
        }
        System.out.println();
        bb = java.nio.ByteBuffer.wrap(new byte[] { (byte) 0xF0, (byte) 0x90, (byte) 0x80, (byte) 0x80 });
        cb = java.nio.charset.Charset.forName("utf8").decode(bb);
        for (int x = 0, xx = cb.limit(); x < xx; ++x) {
            System.out.println(Integer.toHexString(cb.get(x)));
        }
    }
}

ED A0 80 ED B0 80是一个有效的UTF-8字节序列吗？ [英] Is ED A0 80 ED B0 80 a valid UTF-8 byte sequence?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

ED A0 80 ED B0 80是一个有效的UTF-8字节序列吗？ [英] Is ED A0 80 ED B0 80 a valid UTF-8 byte sequence?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭