不是Java中的字符大小2字节？ [英] Isn't the size of character in Java 2 bytes?

查看：115 发布时间：2016/11/18 11:01:13 java string char

本文介绍了不是Java中的字符大小2字节？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用 RandomAccessFile 从文本文件读取字节。

I used RandomAccessFile to read a byte from a text file.

public static void readFile(RandomAccessFile fr) {
    byte[] cbuff = new byte[1];
    fr.read(cbuff,0,1);
    System.out.println(new String(cbuff));
}

为什么我看到一个完整字符被读取？

Why am I seeing one full character being read by this?

推荐答案

A char 表示Java ^（*）。它是2个字节大（至少这是有效值范围建议的）。

A char represents a character in Java ^(*). It is 2 bytes large (at least that's what the valid value range suggests).

这不一定意味着每个字符的表示是2字节长。事实上，许多编码只为每个字符保留1个字节（或者为最常见的字符使用1个字节）。

That doesn't necessarily mean that every representation of a character is 2 bytes long. In fact many encodings only reserve 1 byte for every character (or use 1 byte for the most common characters).

当调用 String你需要Java将 byte [] 转换为 String 使用平台默认编码。由于平台默认编码通常是1字节编码，如ISO-8859-1或可变长度编码，如UTF-8，它可以轻松地将该1字节转换为单个字符。


When you call the String(byte[]) constructor you ask Java to convert the byte[] to a String using the platform default encoding. Since the platform default encoding is usually a 1-byte encoding such as ISO-8859-1 or a variable-length encoding such as UTF-8, it can easily convert that 1 byte to a single character.
如果您在使用UTF-16（或UTF-32或UCS-2或UCS-4或...）作为平台默认编码的平台上运行该代码，那么您将不会得到一个有效的结果（你会得到一个 String 包含Unicode替换字符）。
If you run that code on a platform that uses UTF-16 (or UTF-32 or UCS-2 or UCS-4 or ...) as the platform default encoding, then you will not get a valid result (you'll get a String containing the Unicode Replacement Character instead).
为什么不应该依赖于平台默认编码：当在 byte [] 和 char []  /  String 或 InputStream 和 Reader 之间或 OutputStream 和 Writer ，您应该始终指定要使用的编码。 
That's one of the reasons why you should not depend on the platform default encoding: when converting between byte[] and char[]/String or between InputStream and Reader or between OutputStream and Writer, you should always specify which encoding you want to use. If you don't, then your code will be platform-dependent.

code> char 表示UTF-16代码点。一个或两个 UTF-16代码点表示Unicode码点。 Unicode代码点通常表示一个字符，但有时多个Unicode代码点用于构成一个单个字符。但是上面的近似值足以讨论手头的主题。

^{(*) that's not entirely true: a char represents a UTF-16 codepoint. Either one or two UTF-16 codepoints represent a Unicode codepoint. A Unicode codepoint usually represents a character, but sometimes multiple Unicode codepoints are used to make up a single character. But the approximation above is close enough to discuss the topic at hand.}

这篇关于不是Java中的字符大小2字节？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

不是Java中的字符大小2字节？ [英] Isn't the size of character in Java 2 bytes?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

不是Java中的字符大小2字节？ [英] Isn&#39;t the size of character in Java 2 bytes?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

不是Java中的字符大小2字节？ [英] Isn't the size of character in Java 2 bytes?

登录关闭