java中字符串的字节数? [英] bytes of a string in java?

查看:148
本文介绍了java中字符串的字节数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在java中,如果我有一个字符串 x ,我如何计算该字符串中的字节数?

In java if i have a String x how can i calculate the number of bytes in that string?

推荐答案

字符串是字符的列表(即代码点)。表示字符串所用的字节数完全取决于您使用哪种编码将其转换为字节

A string is a list of characters (i.e. code points). The number of bytes taken to represent the string depends entirely on which encoding you use to turn it into bytes.

这就是说,你可以转将字符串转换为字节数组,然后按如下所示查看其大小:

That said, you can turn the string into a byte array and then look at its size as follows:

// The input string for this test
final String string = "Hello World";

// Check length, in characters
System.out.println(string.length()); // prints "11"

// Check encoded sizes
final byte[] utf8Bytes = string.getBytes("UTF-8");
System.out.println(utf8Bytes.length); // prints "11"

final byte[] utf16Bytes= string.getBytes("UTF-16");
System.out.println(utf16Bytes.length); // prints "24"

final byte[] utf32Bytes = string.getBytes("UTF-32");
System.out.println(utf32Bytes.length); // prints "44"

final byte[] isoBytes = string.getBytes("ISO-8859-1");
System.out.println(isoBytes.length); // prints "11"

final byte[] winBytes = string.getBytes("CP1252");
System.out.println(winBytes.length); // prints "11"

所以你看,即使是简单的ASCII字符串也可以有不同的数字其表示中的字节数,取决于使用的编码。使用您感兴趣的任何字符集作为 getBytes()的参数。并且不要陷入假设UTF-8将每个字符表示为单个字节的陷阱,因为这不是真的:

So you see, even a simple "ASCII" string can have different number of bytes in its representation, depending which encoding is used. Use whichever character set you're interested in for your case, as the argument to getBytes(). And don't fall into the trap of assuming that UTF-8 represents every character as a single byte, as that's not true either:

final String interesting = "\uF93D\uF936\uF949\uF942"; // Chinese ideograms

// Check length, in characters
System.out.println(interesting.length()); // prints "4"

// Check encoded sizes
final byte[] utf8Bytes = interesting.getBytes("UTF-8");
System.out.println(utf8Bytes.length); // prints "12"

final byte[] utf16Bytes= interesting.getBytes("UTF-16");
System.out.println(utf16Bytes.length); // prints "10"

final byte[] utf32Bytes = interesting.getBytes("UTF-32");
System.out.println(utf32Bytes.length); // prints "16"

final byte[] isoBytes = interesting.getBytes("ISO-8859-1");
System.out.println(isoBytes.length); // prints "4" (probably encoded "????")

final byte[] winBytes = interesting.getBytes("CP1252");
System.out.println(winBytes.length); // prints "4" (probably encoded "????")

(注意,如果你不提供字符集参数,使用平台的默认字符集。这在某些情况下可能很有用,但一般情况下应避免依赖默认值,并始终使用显式字符集需要编码/解码。)

(Note that if you don't provide a character set argument, the platform's default character set is used. This might be useful in some contexts, but in general you should avoid depending on defaults, and always use an explicit character set when encoding/decoding is required.)

这篇关于java中字符串的字节数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆