std :: u16string，std :: u32string，std :: string，length（），size（），codepoints和characters [英] std::u16string, std::u32string, std::string, length(), size(), codepoints and characters

查看：352 发布时间：2016/10/26 22:13:16 c++ unicode

本文介绍了std :: u16string，std :: u32string，std :: string，length（），size（），codepoints和characters的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我很高兴看到C ++ 11中的 std :: u16string 和 std :: u32string ，但我想知道为什么没有 std :: u8string 来处理UTF-8的情况。我的印象是 std :: string 是为UTF-8，但它似乎并没有做得很好。我的意思是，不 std :: string.length（）仍然返回字符串的缓冲区大小，而不是字符串中的字符数量

I'm happy to see the std::u16string and std::u32string in C++11, but I'm wondering why there is no std::u8string to handle the UTF-8 case. I'm under the impression that std::string is intended for UTF-8, but it doesn't seem to do it very well. What I mean is, doesn't std::string.length() still return the size of the string's buffer rather than the number of characters in the string?

那么，为新的C ++ 11类定义的标准字符串的 length（）方法如何？它们返回字符串缓冲区的大小，代码点的数量或字符数（假设代理对是2个代码点，但是一个字符，如果我错了，请纠正我）。

So, how is the length() method of the standard strings defined for the new C++11 classes? Do they return the size of the string's buffer, the number of codepoints, or the number of characters (assuming a surrogate pair is 2 code points, but one character. Please correct me if I'm wrong)?

size（）;不等于 length（）？
请参见 http://en.cppreference.com/w/cpp/string/basic_string/length

And what about size(); isn't it equal to length()? See http://en.cppreference.com/w/cpp/string/basic_string/length for the source of my confusion.

所以，我想，我的基本问题是如何使用 std :: string ， std :: u16string 和 std :: u32string ，并正确区分缓冲区大小，编码点数和字符数？如果你使用标准迭代器，你是否在字节，代码点或字符上进行迭代？

So, I guess, my fundamental question is how does one use std::string, std::u16string, and std::u32string and properly distinguish between buffer size, number of codepoints, and number of characters? If you use the standard iterators, are you iterating over bytes, codepoints, or characters?

推荐答案

u16string 和 u32string 不是新C ++ 11类。它们只是 std :: basic_string 的typedef char16_t 和 cha32_t 类型。

u16string and u32string are not "new C++11 classes". They're just typedefs of std::basic_string for char16_t and cha32_t types.

长度始终等于 size 任何 basic_string 。它是字符串中 T 的数字， T 是 basic_string 。

length is always equal to size for any basic_string. It is the number of T's in the string, where T is the template type for the basic_string.

basic_string 或形式。它没有代码点，字形，Unicode字符，Unicode标准化或任何类型的概念。它只是 T 的有序序列。唯一能识别Unicode的关于 u16string 和 u32string 的是它们使用<$ c $返回的类型c> u和 U

basic_string is not Unicode aware in any way, shape, or form. It has no concept of codepoints, graphemes, Unicode characters, Unicode normalization, or anything of the kind. It is simply a ordered sequence of Ts. The only thing that is Unicode-aware about u16string and u32string is that they use the type returned by u"" and U"" literals. Thus, they can store Unicode-encoded strings, but they do nothing that requires knowledge of said encoding.

迭代器迭代 T


Iterators iterate over elements of T, not "bytes, codepoints, or characters". If T is char16_t, then it will iterate over char16_ts. If the string is UTF-16-encoded, then it is iterating over UTF-16 code units, not Unicode codepoints or bytes.

                        这篇关于std :: u16string，std :: u32string，std :: string，length（），size（），codepoints和characters的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

std :: u16string，std :: u32string，std :: string，length（），size（），codepoints和characters [英] std::u16string, std::u32string, std::string, length(), size(), codepoints and characters

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

std :: u16string，std :: u32string，std :: string，length（），size（），codepoints和characters [英] std::u16string, std::u32string, std::string, length(), size(), codepoints and characters

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭