C++ std::string 的长度(以字节为单位) [英] Length of a C++ std::string in bytes
问题描述
我在弄清楚 std::string.length()
的确切语义时遇到了一些麻烦.documentation 明确指出 length()
返回字符串中的字符数,不是字节数.我想知道在哪些情况下这会有所作为.
I'm having some trouble figuring out the exact semantics of std::string.length()
.
The documentation explicitly points out that length()
returns the number of characters in the string and not the number of bytes. I was wondering in which cases this actually makes a difference.
特别是,这仅与 std::basic_string<>
的非字符实例化有关,还是在存储具有多字节字符的 UTF-8 字符串时也会遇到麻烦?标准是否允许 length()
支持 UTF8?
In particular, is this only relevant to non-char instantiations of std::basic_string<>
or can I also get into trouble when storing UTF-8 strings with multi-byte characters? Does the standard allow for length()
to be UTF8-aware?
推荐答案
当处理 std::basic_string<>
的非 char
实例化时,当然,长度可能不等于字节数.这在 std::wstring
:
When dealing with non-char
instantiations of std::basic_string<>
, sure, length may not equal number of bytes. This is particularly evident with std::wstring
:
std::wstring ws = L"hi";
cout << ws.length(); // <-- 2, not 4
但是 std::string
是关于 char
字符的;就 std::string
而言,没有多字节字符这样的东西,无论您是否在高层次上塞满了一个字符.因此, std::string.length()
始终是字符串表示的字节数.请注意,如果您将多字节字符"塞入 std::string
,那么您对字符"的定义会突然与容器和标准的定义不一致.
But std::string
is about char
characters; there is no such thing as a multi-byte character as far as std::string
is concerned, whether you crammed one in at a high level or not. So, std::string.length()
is always the number of bytes represented by the string. Note that if you're cramming multibyte "characters" into an std::string
, then your definition of "character" suddenly becomes at odds with that of the container and of the standard.
这篇关于C++ std::string 的长度(以字节为单位)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!