在C ++ 11中缺少std :: u8string [英] absent std::u8string in C++11

查看:353
本文介绍了在C ++ 11中缺少std :: u8string的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么C ++ 11提供std::u16stringstd::u32string而不提供std::u8string?我们需要实现utf-8编码还是使用其他库?

Why C++11 provides std::u16string and std::u32string and not std::u8string? We need to implement the utf-8 encoding or using additional libraries?

推荐答案

C++20 adds char8_t and std::u8string. According to the proposal, the rationale is:

UTF-8是C ++标准唯一要求支持的文本编码,没有特殊的代码单元类型.对于UTF-8编码的字符和字符串文字,缺少独特的类型可以防止在设计用于与编码文本互操作的接口中使用重载和模板专业化.无法推断出用于窄字符和字符串的编码会限制设计的可能性,并会阻碍在通用代码中似乎无法正常工作的优雅界面的产生.库作者必须选择限制编码支持,设计要求用户明确指定编码的接口,或者至少为实现定义的执行和UTF-8编码提供不同的接口.

UTF-8 is the only text encoding mandated to be supported by the C++ standard for which there is no distinct code unit type. Lack of a distinct type for UTF-8 encoded character and string literals prevents the use of overloading and template specialization in interfaces designed for interoperability with encoded text. The inability to infer an encoding for narrow characters and strings limits design possibilities and hinders the production of elegant interfaces that work seemlessly in generic code. Library authors must choose to limit encoding support, design interfaces that require users to explicitly specify encodings, or provide distinct interfaces for, at least, the implementation defined execution and UTF-8 encodings.

无论char是带符号的还是无符号的类型,都是由实现定义的,使用8位带符号的字符的实现在处理UTF-8编码的文本方面处于不利地位,因为必须依赖于对无符号类型的转换类型,以便正确处理多字节编码代码点的前导和连续代码单元.

Whether char is a signed or unsigned type is implementation defined and implementations that use an 8-bit signed char are at a disadvantage with respect to working with UTF-8 encoded text due to the necessity of having to rely on conversions to unsigned types in order to correctly process leading and continuation code units of multi-byte encoded code points.

缺少独特的类型,并且使用的代码单元类型的范围不包含可移植的UTF-8代码单元的完整无符号范围,这给使用UTF-8编码的文本带来了挑战,而当文本不存在时,使用UTF-16或UTF-32编码的文本.随函附上了有关新的char8_t基本类型和相关库增强功能的提案,旨在消除使用UTF-8编码的文本的障碍,并使通用接口能够以一致的方式与所有五种标准授权文本编码一起使用.

The lack of a distinct type and the use of a code unit type with a range that does not portably include the full unsigned range of UTF-8 code units presents challenges for working with UTF-8 encoded text that are not present when working with UTF-16 or UTF-32 encoded text. Enclosed is a proposal for a new char8_t fundamental type and related library enhancements intended to remove barriers to working with UTF-8 encoded text and to enable generic interfaces that work with all five of the standard mandated text encodings in a consistent manner.

这篇关于在C ++ 11中缺少std :: u8string的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆