UTF-8与ISO 8859-1之间的转换： [英] Conversion between UTF-8 and ISO 8859-1:

查看：330 发布时间：2017/8/17 1:51:26 java encoding utf-8 iso-8859-1

本文介绍了UTF-8与ISO 8859-1之间的转换：的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在SO中找到了以下代码。这是否真的有效？

String xml = new String（áéíóúñ）; byte [] latin1 = xml.getBytes（UTF-8）; byte [] utf8 = new String（latin1，ISO-8859-1）。getBytes（UTF-8）;我的意思是说， latin1是UTF-8在第二行编码，但是读取als ISO-8859-1编码在第三行？可以这样做吗？

不是我不想批评引用的代码，我只是困惑，因为我遇到了一些非常相似的遗留代码，似乎工作，我不能解释为什么。

编辑：我猜在原来的帖子，第2行中的UTF-8只是一个TYPO。但是我不确定...

EDIT2：在我初次发布后，有人编辑了上面的代码，将第二行改为 byte [] latin1 = xml.getBytes（ISO-8859-1）; 。我不知道是谁做了，为什么他这样做，但很明显这搞砸了很多。对所有看到错误版本代码的人很抱歉。我不知道谁编辑它。上面的代码是正确的。

解决方案

getBytes（Charset charset）导致使用 charset 编码的字节数组，因此latin1是UTF-8编码的。

将 System.out.println（latin1.length）; 作为第三行，它会告诉你字节数组长度为12.这意味着它是真正的UTF-8编码。 p>

new String（latin1，ISO-8859-1）不正确，因为latin1是UTF-8编码的，请将其解析为ISO-8859-1。这就是为什么它产生一个由12个垃圾符号组成的String：使用UTF-8编码从ÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÂ

我希望一切都清楚了。

I found the following code in SO. Does this really work? String xml = new String("áéíóúñ"); byte[] latin1 = xml.getBytes("UTF-8"); byte[] utf8 = new String(latin1, "ISO-8859-1").getBytes("UTF-8"); I mean, latin1 is UTF-8-encoded in the second line, but read als ISO-8859-1-encoded in the third? Can this ever work? Not that I did not want to criticize the cited code, I am just confused since I ran into some legacy code that is very similar, that seems to work, and I cannot explain why. EDIT: I guess in the original post, "UTF-8" in line 2 was just a TYPO. But I am not sure ... EDIT2: After my initial posting, someone edited the code above and changed the 2nd line to byte[] latin1 = xml.getBytes("ISO-8859-1");. I don't know who did that and why he did it, but clearly this messed up pretty much. Sorry to all who saw the wrong version of the code. I don't know who edited it. The code above is correct now. 解决方案 getBytes(Charset charset) results in a byte array encoded using the charset, so latin1 is UTF-8 encoded. Put System.out.println(latin1.length); as the third line and it will tell you that byte array length is 12. This means that it is really UTF-8 encoded. new String(latin1, "ISO-8859-1") is incorrect because latin1 is UTF-8 encoded and you're telling to parse it as ISO-8859-1. That's why it produces a String made of 12 symbols of garbage: Ã¡Ã©ÃÃ³ÃºÃ±. When you're getting bytes from Ã¡Ã©ÃÃ³ÃºÃ± using UTF-8 encoding it results in a 24 long byte array. I hope everything is clear now. 这篇关于UTF-8与ISO 8859-1之间的转换：的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

UTF-8与ISO 8859-1之间的转换： [英] Conversion between UTF-8 and ISO 8859-1:

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

UTF-8与ISO 8859-1之间的转换： [英] Conversion between UTF-8 and ISO 8859-1:

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭