为什么 Java 的 String.getBytes() 使用“ISO-8859-1"? [英] Why does Java's String.getBytes() uses "ISO-8859-1"

查看:54
本文介绍了为什么 Java 的 String.getBytes() 使用“ISO-8859-1"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自 java.lang.StringCoding :

from java.lang.StringCoding :

String csn = (charsetName == null) ? "ISO-8859-1" : charsetName;

这是从 Java.lang.getBytes() 使用的,在 linux jdk 7 中我一直认为 UTF-8 是默认字符集?

This is what is used from Java.lang.getBytes() , in linux jdk 7 I was always under the impression that UTF-8 is the default charset ?

谢谢

推荐答案

有点复杂...

Java 尝试使用默认字符编码通过 String.getBytes() 返回字节.

It is a bit complicated ...

Java tries to use the default character encoding to return bytes using String.getBytes().

  • 默认字符集由系统 file.encoding 属性提供.
  • 这是缓存的,在 JVM 启动后通过 System.setProperty(..) 更改它没有任何用处.
  • 如果 file.encoding 属性未映射到已知字符集,则指定 UTF-8.

.... 这是棘手的部分(可能永远不会发挥作用)....

.... Here is the tricky part (which is probably never going to come into play) ....

如果系统无法使用默认字符集(UTF-8 或其他字符集)解码或编码字符串,则将退回到 ISO-8859-1.如果回退不起作用......系统将失败!

If the system cannot decode or encode strings using the default charset (UTF-8 or another one), then there will be a fallback to ISO-8859-1. If the fallback does not work ... the system will fail!

.... 真的...(喘气!)...如果我指定的字符集不能使用,它会崩溃吗,UTF-8 或 ISO-8859-1 也不可用?

.... Really ... (gasp!) ... Could it crash if my specified charset cannot be used, and UTF-8 or ISO-8859-1 are also unusable?

是的.StringCoding.encode(...) 方法中的 Java 源代码注释状态:

Yes. The Java source comments state in the StringCoding.encode(...) method:

//如果我们找不到 ISO-8859-1(一种必需的编码),那么安装就会出现严重问题.

// If we can not find ISO-8859-1 (a required encoding) then things are seriously wrong with the installation.

... 然后它调用 System.exit(1)

... and then it calls System.exit(1)

用户 JVM 可能不支持以 UTF-8 或 JVM 启动时指定的字符集进行解码和编码,尽管可能性不大.

It is possible, although not probable, that the users JVM may not support decoding and encoding in UTF-8 or the charset specified on JVM startup.

那么,在getBytes()过程中String类中的默认字符集是否正确使用?

Then, is the default charset used properly in the String class during getBytes()?

没有.然而,更好的问题是......

No. However, the better question is ...

Javadoc 中定义的合同是正确的.

The contract as defined in the Javadoc is correct.

当这个字符串不能被编码时这个方法的行为默认字符集未指定.CharsetEncoder 类应该是当需要对编码过程进行更多控制时使用.

The behavior of this method when this string cannot be encoded in the default charset is unspecified. The CharsetEncoder class should be used when more control over the encoding process is required.

<小时>

好消息(以及更好的做事方式)

始终建议明确指定ISO-8859-1"或US-ASCII"或UTF-8"或在将字节转换为字符串时所需的任何字符集,反之亦然——除非——你之前已获得默认字符集并 100% 确定它是您需要的.


The good news (and better way of doing things)

It is always advised to explicitly specify "ISO-8859-1" or "US-ASCII" or "UTF-8" or whatever character set you want when converting bytes into Strings of vice-versa -- unless -- you have previously obtained the default charset and made 100% sure it is the one you need.

改用这个方法:

public byte[] getBytes(String charsetName)

要查找系统的默认值,只需使用:

To find the default for your system, just use:

Charset.defaultCharset()

希望有所帮助.

这篇关于为什么 Java 的 String.getBytes() 使用“ISO-8859-1"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆