java中有多少字节的英文和中文字符? [英] How many bytes of English and Chinese characters take in java?

查看:65
本文介绍了java中有多少字节的英文和中文字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

import java.io.UnsupportedEncodingException;

public class TestChar {

    public static void main(String[] args) throws UnsupportedEncodingException {
        String cnStr = "龙";
        String enStr = "a";
        byte[] cnBytes = cnStr.getBytes("UTF-8");
        byte[] enBytes = enStr.getBytes("UTF-8");

        System.out.println("bytes size of Chinese:" + cnBytes.length);
        System.out.println("bytes size of English:" + enBytes.length);

        //  in java, char takes two bytes, the question is: 
        char cnc = '龙'; // will '龙‘ take two or three bytes ?
        char enc = 'a'; // will 'a' take one or two bytes ?
    }
}

输出:

   bytes size of Chinese:3

   bytes size of English:1

在这里,我的JVM设置为UTF-8,从输出中我们知道汉字龙"占3个字节,而英文字符"a"占1个字节.我的问题是:

Here, My JVM is set as UTF-8, from the output, we know Chinese character '龙' takes 3 bytes, and English character 'a' takes one byte. My question is:

在Java中,char占用两个字节,这里char cnc ='龙';char enc ='a';cnc将只占用2个字节而不是3个字节吗?而"a"需要两个字节而不是一个字节?

In Java, char takes two bytes, here, char cnc = '龙'; char enc = 'a'; will cnc only takes two bytes instead of 3 bytes ? And 'a' takes two bytes instead of one byte ?

推荐答案

UTF-8 是长度可变的字符编码,其中的字符占1至4个字节.

UTF-8 is a variable-length character encoding, where characters take up 1 to 4 bytes.

Java char 是16位.请参见中的 3.1 Unicode Java语言规范以了解Java如何精确地处理Unicode.

A Java char is 16 bits. See 3.1 Unicode in the Java Language Specification to understand how exactly Java handles Unicode.

这篇关于java中有多少字节的英文和中文字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆