UTF-8与ISO 8859-1之间的转换: [英] Conversion between UTF-8 and ISO 8859-1:

查看:330
本文介绍了UTF-8与ISO 8859-1之间的转换:的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在SO中找到了以下代码。这是否真的有效?

  String xml = new String(áéíóúñ); 
byte [] latin1 = xml.getBytes(UTF-8);
byte [] utf8 = new String(latin1,ISO-8859-1)。getBytes(UTF-8);我的意思是说, latin1
是UTF-8在第二行编码,但是读取als ISO-8859-1编码在第三行?可以这样做吗?



不是我不想批评引用的代码,我只是困惑,因为我遇到了一些非常相似的遗留代码,似乎工作,我不能解释为什么。



编辑:我猜在原来的帖子,第2行中的UTF-8只是一个TYPO。但是我不确定...



EDIT2:在我初次发布后,有人编辑了上面的代码,将第二行改为 byte [] latin1 = xml.getBytes(ISO-8859-1); 。我不知道是谁做了,为什么他这样做,但很明显这搞砸了很多。对所有看到错误版本代码的人很抱歉。我不知道谁编辑它。上面的代码是正确的。

解决方案

getBytes(Charset charset)导致使用 charset 编码的字节数组,因此latin1是UTF-8编码的。



System.out.println(latin1.length); 作为第三行,它会告诉你字节数组长度为12.这意味着它是真正的UTF-8编码。 p>

new String(latin1,ISO-8859-1)不正确,因为latin1是UTF-8编码的,请将其解析为ISO-8859-1。这就是为什么它产生一个由12个垃圾符号组成的String:使用UTF-8编码从ÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÂ

我希望一切都清楚了。


I found the following code in SO. Does this really work?

String xml = new String("áéíóúñ");
byte[] latin1 = xml.getBytes("UTF-8");
byte[] utf8 = new String(latin1, "ISO-8859-1").getBytes("UTF-8");

I mean, latin1 is UTF-8-encoded in the second line, but read als ISO-8859-1-encoded in the third? Can this ever work?

Not that I did not want to criticize the cited code, I am just confused since I ran into some legacy code that is very similar, that seems to work, and I cannot explain why.

EDIT: I guess in the original post, "UTF-8" in line 2 was just a TYPO. But I am not sure ...

EDIT2: After my initial posting, someone edited the code above and changed the 2nd line to byte[] latin1 = xml.getBytes("ISO-8859-1");. I don't know who did that and why he did it, but clearly this messed up pretty much. Sorry to all who saw the wrong version of the code. I don't know who edited it. The code above is correct now.

解决方案

getBytes(Charset charset) results in a byte array encoded using the charset, so latin1 is UTF-8 encoded.

Put System.out.println(latin1.length); as the third line and it will tell you that byte array length is 12. This means that it is really UTF-8 encoded.

new String(latin1, "ISO-8859-1") is incorrect because latin1 is UTF-8 encoded and you're telling to parse it as ISO-8859-1. That's why it produces a String made of 12 symbols of garbage: áéíóúñ.

When you're getting bytes from áéíóúñ using UTF-8 encoding it results in a 24 long byte array.

I hope everything is clear now.

这篇关于UTF-8与ISO 8859-1之间的转换:的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆