Java中的字符是1字节还是2字节？ [英] Is a character 1 byte or 2 bytes in Java?

查看：433 发布时间：2018/12/26 14:01:41 java string unicode ascii character

本文介绍了Java中的字符是1字节还是2字节？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我认为java中的字符是16位，如 java doc 。字符串不是这样吗？我有一个代码将对象存储到文件中：

I thought characters in java are 16 bits as suggested in java doc. Isn't it the case for strings? I have a code that stores an object into a file:

public static void storeNormalObj(File outFile, Object obj) {
    FileOutputStream fos = null;
    ObjectOutputStream oos = null;
    try {
        fos = new FileOutputStream(outFile);
        oos = new ObjectOutputStream(fos);
        oos.writeObject(obj);
        oos.flush();
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            oos.close();
            try {
                fos.close();
            } catch (Exception e) {
                e.printStackTrace();
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

基本上，我试图存储一个字符串abcd进入文件output，当我打开输出使用编辑器并删除了无字符串部分，剩下的只是字符串abcd，总共是4个字节。谁知道为什么？对于ASCII支持的字符串，java是否使用ASCII而不是UNICODE自动节省空间？谢谢

Basically, I tried to store an string "abcd" in to file "output", when I opened up output with an editor and deleted the none string part, what's left is just the string "abcd", which is 4 bytes in total. Anyone knows why? Does java automatically saves space by using ASCII instead of UNICODE for Strings that can be supported by ASCII? Thanks

推荐答案

（我认为无字符串部分是指ObjectOutputStream在创建时发出的字节。您可能不想使用ObjectOutputStream，但我不知道您的要求。）

(I think by "none string part" you are referring to the bytes that ObjectOutputStream emits when you create it. It is possible you don't want to use ObjectOutputStream, but I don't know your requirements.)

仅仅是FYI，Unicode和UTF-8不是一回事。 Unicode是一种标准，除其他外，指定可用的字符。 UTF-8是一种字符编码，用于指定这些字符在1和0中的物理编码方式。 UTF-8可以使用1个字节表示ASCII（< = 127），最多4个字节表示其他Unicode字符。

Just FYI, Unicode and UTF-8 are not the same thing. Unicode is a standard that specifies, amongst other things, what characters are available. UTF-8 is a character encoding that specifies how these characters shall be physically encoded in 1s and 0s. UTF-8 can use 1 byte for ASCII (<= 127) and up to 4 bytes to represent other Unicode characters.

UTF-8是ASCII的严格超集。因此，即使您为文件指定UTF-8编码并向其写入abcd，它也只包含这四个字节：它们在ASCII中具有与UTF-8中相同的物理编码。

UTF-8 is a strict superset of ASCII. So even if you specify a UTF-8 encoding for a file and you write "abcd" to it, it will contain just those four bytes: they have the same physical encoding in ASCII as they do in UTF-8.

您的方法使用 ObjectOutputStream ，它实际上具有与ASCII或UTF-8截然不同的编码！如果您仔细阅读Javadoc，如果 obj 是一个字符串并且已经在流中发生，则后续调用 writeObject 将引发对前一个字符串的引用，在重复字符串的情况下可能会导致写入更少的字节。

Your method uses ObjectOutputStream which actually has a significantly different encoding than either ASCII or UTF-8! If you read the Javadoc carefully, if obj is a string and has already occurred in the stream, subsequent calls to writeObject will cause a reference to the previous string to be emitted, potentially causing many fewer bytes to be written in the case of repeated strings.

如果你认真理解这个，你真的应该花大量时间阅读有关Unicode和字符编码系统的知识。维基百科有一篇关于 Unicode 的优秀文章作为开头。

If you're serious about understanding this, you really should spend a good amount of time reading about Unicode and character encoding systems. Wikipedia has an excellent article on Unicode as a start.

这篇关于Java中的字符是1字节还是2字节？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Java中的字符是1字节还是2字节？ [英] Is a character 1 byte or 2 bytes in Java?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Java中的字符是1字节还是2字节？ [英] Is a character 1 byte or 2 bytes in Java?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭