Java中字符串的字节数 [英] Bytes of a string in Java

查看:67
本文介绍了Java中字符串的字节数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Java中,如果我有一个字符串x,如何计算该字符串中的字节数?

In Java, if I have a String x, how can I calculate the number of bytes in that string?

推荐答案

字符串是字符的列表(即代码点).表示字符串所需的字节数完全取决于您将其转换为字节所使用的编码.

A string is a list of characters (i.e. code points). The number of bytes taken to represent the string depends entirely on which encoding you use to turn it into bytes.

也就是说,您可以将字符串转换为字节数组,然后按如下所示查看其大小:

That said, you can turn the string into a byte array and then look at its size as follows:

// The input string for this test
final String string = "Hello World";

// Check length, in characters
System.out.println(string.length()); // prints "11"

// Check encoded sizes
final byte[] utf8Bytes = string.getBytes("UTF-8");
System.out.println(utf8Bytes.length); // prints "11"

final byte[] utf16Bytes= string.getBytes("UTF-16");
System.out.println(utf16Bytes.length); // prints "24"

final byte[] utf32Bytes = string.getBytes("UTF-32");
System.out.println(utf32Bytes.length); // prints "44"

final byte[] isoBytes = string.getBytes("ISO-8859-1");
System.out.println(isoBytes.length); // prints "11"

final byte[] winBytes = string.getBytes("CP1252");
System.out.println(winBytes.length); // prints "11"

因此,您看到,即使是简单的"ASCII"字符串,其表示形式也可以具有不同数量的字节,具体取决于所使用的编码.使用您感兴趣的字符集作为getBytes()的参数.并且不要陷入假设UTF-8将每个字符表示为单个字节的陷阱,因为这也不是正确的:

So you see, even a simple "ASCII" string can have different number of bytes in its representation, depending which encoding is used. Use whichever character set you're interested in for your case, as the argument to getBytes(). And don't fall into the trap of assuming that UTF-8 represents every character as a single byte, as that's not true either:

final String interesting = "\uF93D\uF936\uF949\uF942"; // Chinese ideograms

// Check length, in characters
System.out.println(interesting.length()); // prints "4"

// Check encoded sizes
final byte[] utf8Bytes = interesting.getBytes("UTF-8");
System.out.println(utf8Bytes.length); // prints "12"

final byte[] utf16Bytes= interesting.getBytes("UTF-16");
System.out.println(utf16Bytes.length); // prints "10"

final byte[] utf32Bytes = interesting.getBytes("UTF-32");
System.out.println(utf32Bytes.length); // prints "16"

final byte[] isoBytes = interesting.getBytes("ISO-8859-1");
System.out.println(isoBytes.length); // prints "4" (probably encoded "????")

final byte[] winBytes = interesting.getBytes("CP1252");
System.out.println(winBytes.length); // prints "4" (probably encoded "????")

(请注意,如果您不提供字符集参数,则会使用平台的默认字符集.这在某些情况下可能很有用,但通常应避免使用默认值,并在需要编码/解码时始终使用显式字符集.)

(Note that if you don't provide a character set argument, the platform's default character set is used. This might be useful in some contexts, but in general you should avoid depending on defaults, and always use an explicit character set when encoding/decoding is required.)

这篇关于Java中字符串的字节数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆