为什么Java的String.getBytes()使用“ISO-8859-1” [英] Why does Java's String.getBytes() uses "ISO-8859-1"

查看:608
本文介绍了为什么Java的String.getBytes()使用“ISO-8859-1”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自java.lang.StringCoding:

from java.lang.StringCoding :

String csn = (charsetName == null) ? "ISO-8859-1" : charsetName;

这是从Java.lang.getBytes()使用的,在linux jdk 7
我总是感觉到UTF-8是默认的字符集?

This is what is used from Java.lang.getBytes() , in linux jdk 7 I was always under the impression that UTF-8 is the default charset ?

谢谢

推荐答案

这是一个有点复杂...



Java 尝试 字符编码使用String.getBytes()返回字节。

It is a bit complicated ...

Java tries to use the default character encoding to return bytes using String.getBytes().


  • 默认字符集由系统file.encoding属性提供。

  • 这是缓存的,并且在JVM启动之后没有通过System.setProperty(..)更改它。

  • 如果file.encoding属性未映射到已知的字符集,则指定UTF-8。

....这是棘手的部分

.... Here is the tricky part (which is probably never going to come into play) ....

如果系统不能使用默认字符集(UTF-8或另一个)解码或编码字符串,那么将会有一个回退到ISO-8859-1。如果后备不工作...系统会失败!

If the system cannot decode or encode strings using the default charset (UTF-8 or another one), then there will be a fallback to ISO-8859-1. If the fallback does not work ... the system will fail!

....真的...(gasp!)...如果我指定的不能使用字符集,UTF-8或ISO-8859-1也不可用?

.... Really ... (gasp!) ... Could it crash if my specified charset cannot be used, and UTF-8 or ISO-8859-1 are also unusable?

是的。 Java源代码在StringCoding.encode(...)方法中的状态:

Yes. The Java source comments state in the StringCoding.encode(...) method:


//如果我们找不到ISO-8859- 1(一个必需的编码)然后事情严重错误安装。

// If we can not find ISO-8859-1 (a required encoding) then things are seriously wrong with the installation.

...然后它调用System.exit )

... and then it calls System.exit(1)

尽管不太可能,但用户JVM可能不支持UTF-8中的解码和编码,或者JVM启动时指定的字符集。

It is possible, although not probable, that the users JVM may not support decoding and encoding in UTF-8 or the charset specified on JVM startup.

然后,是在getBytes()期间在String类中正确使用的默认字符集吗?

Then, is the default charset used properly in the String class during getBytes()?

否。但是,更好的问题是...

No. However, the better question is ...

Javadoc中定义的合约是正确的。

The contract as defined in the Javadoc is correct.


字符串不能编码在
默认字符集未指定。 CharsetEncoder 类应该是
,当需要更多控制编码过程时使用。

The behavior of this method when this string cannot be encoded in the default charset is unspecified. The CharsetEncoder class should be used when more control over the encoding process is required.


b $ b




好消息(和更好的处理方式)



ISO-8859-1或US-ASCII或UTF-8或任何字符集,当将字节转换为字符串时,反之亦然 - 除非你先前获得了默认的字符集,


The good news (and better way of doing things)

It is always advised to explicitly specify "ISO-8859-1" or "US-ASCII" or "UTF-8" or whatever character set you want when converting bytes into Strings of vice-versa -- unless -- you have previously obtained the default charset and made 100% sure it is the one you need.

使用此方法代替:

public byte[] getBytes(String charsetName)

要查找系统的默认值, :

To find the default for your system, just use:

Charset.defaultCharset()

希望有帮助。

这篇关于为什么Java的String.getBytes()使用“ISO-8859-1”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆