Java中字符串的字节数 [英] Bytes of a string in Java

查看:23
本文介绍了Java中字符串的字节数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Java 中,如果我有一个字符串 x,我如何计算该字符串中的字节数?

In Java, if I have a String x, how can I calculate the number of bytes in that string?

推荐答案

字符串是字符(即代码点)的列表.用于表示字符串的字节数完全取决于您使用哪种编码将其转换为字节.

A string is a list of characters (i.e. code points). The number of bytes taken to represent the string depends entirely on which encoding you use to turn it into bytes.

也就是说,您可以将字符串转换为字节数组,然后按如下方式查看其大小:

That said, you can turn the string into a byte array and then look at its size as follows:

// The input string for this test
final String string = "Hello World";

// Check length, in characters
System.out.println(string.length()); // prints "11"

// Check encoded sizes
final byte[] utf8Bytes = string.getBytes("UTF-8");
System.out.println(utf8Bytes.length); // prints "11"

final byte[] utf16Bytes= string.getBytes("UTF-16");
System.out.println(utf16Bytes.length); // prints "24"

final byte[] utf32Bytes = string.getBytes("UTF-32");
System.out.println(utf32Bytes.length); // prints "44"

final byte[] isoBytes = string.getBytes("ISO-8859-1");
System.out.println(isoBytes.length); // prints "11"

final byte[] winBytes = string.getBytes("CP1252");
System.out.println(winBytes.length); // prints "11"

所以你看,即使是一个简单的ASCII"字符串在其表示中也可能有不同数量的字节,这取决于使用的编码.使用您对案例感兴趣的任何字符集,作为 getBytes() 的参数.并且不要陷入假设 UTF-8 将 每个 字符表示为单个字节的陷阱,因为这也不是真的:

So you see, even a simple "ASCII" string can have different number of bytes in its representation, depending which encoding is used. Use whichever character set you're interested in for your case, as the argument to getBytes(). And don't fall into the trap of assuming that UTF-8 represents every character as a single byte, as that's not true either:

final String interesting = "uF93DuF936uF949uF942"; // Chinese ideograms

// Check length, in characters
System.out.println(interesting.length()); // prints "4"

// Check encoded sizes
final byte[] utf8Bytes = interesting.getBytes("UTF-8");
System.out.println(utf8Bytes.length); // prints "12"

final byte[] utf16Bytes= interesting.getBytes("UTF-16");
System.out.println(utf16Bytes.length); // prints "10"

final byte[] utf32Bytes = interesting.getBytes("UTF-32");
System.out.println(utf32Bytes.length); // prints "16"

final byte[] isoBytes = interesting.getBytes("ISO-8859-1");
System.out.println(isoBytes.length); // prints "4" (probably encoded "????")

final byte[] winBytes = interesting.getBytes("CP1252");
System.out.println(winBytes.length); // prints "4" (probably encoded "????")

(请注意,如果您不提供字符集参数,则使用平台的默认字符集.这在某些情况下可能很有用,但通常您应该避免依赖默认值,并且在需要编码/解码时始终使用显式字符集.)

(Note that if you don't provide a character set argument, the platform's default character set is used. This might be useful in some contexts, but in general you should avoid depending on defaults, and always use an explicit character set when encoding/decoding is required.)

这篇关于Java中字符串的字节数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆