C ++中的std :: string是否具有编码格式 [英] Does std::string in c++ has encoding format
问题描述
我想找到有关std :: string的默认编码格式。
我试图找出编码格式,但是我不知道。
c ++中的std ::字符串是否具有编码格式?
I want to find the default encoding format about std:: string.
I am trying to find out the encoding format, but I have no idea.
Does std:: string in c++ has encoding format ?
推荐答案
简单答案
std :: string
被定义为 std :: basic_string< char>
它是字符的集合。作为char的集合,它可以容纳作为 utf8 字符串编码结果的char。
The simple answer
std::string
is defined as std::basic_string<char>
which means it is a collection of chars. As a collection of chars it can potentially hold chars that are the encoded result of a utf8 string.
以下代码有效直到C ++ 20 :
std::string s = u8"1 שלום Hello";
std::cout << s << std::endl;
1你好
带括号的字符串之前的 u8
href = https://en.cppreference.com/w/cpp/language/string_literal rel = nofollow noreferrer>字符串文字,用于 utf8
告诉编译器认为以下带括号的字符串具有utf8编码。
The u8
before the parenthesized string is the string literal for utf8
telling the compiler that the following parenthesized string has utf8 encoding.
如果没有 u8
前缀表示法,编译器将根据编译器的源编码获取字符串。默认编码或为编译器明确设置的编码支持字符串中的字符,也可以像这样:
Without the u8
prefix notation the compiler would take the string based on the source encoding of the compiler, so if the default encoding or the encoding explicitly set for the compiler supports the chars in the string it can take it also like this:
std::string s = "1 שלום Hello";
std::cout << s << std::endl;
和与上述相同的输出。
如果编译器的源编码不支持这些字符,例如,如果我们在gcc中将源编码设置为 LATIN 带有标志 -fexec-charset = ISO-8859-1
的字符串,不含 u8
前缀出现以下编译错误:
If the source encoding of the compiler doesn't support these chars, for example if we are setting in gcc the source encoding to LATIN with the flag -fexec-charset=ISO-8859-1
the string without u8
prefix gives the following compilation error:
converting to execution character set:
Invalid or incomplete multibyte or wide character
std::string s = "1 שלום Hello";
^~~~~~~~~~~~~~
由于C ++ 20 u8
带括号的字符串不能转换为 std :: string
:
Since C++20 u8
parenthesized string cannot be converted into std::string
:
std::string s = u8"1 שלום Hello";
std::cout << s << std::endl;
conversion from 'const char8_t [17]' to non-scalar type 'std::string'
{aka 'std::__cxx11::basic_string<char>'} requested
std::string s = u8"1 שלום Hello";
^~~~~~~~~~~~~~~~~
这是因为C ++ 20中 u8
带括号的字符串的类型不是 const char [SIZE]
而是 const char8_t [SIZE]
(在C ++ 20中引入了 char8_t
类型)。
This is because the type of u8
parenthesized string in C++20 is not const char[SIZE]
but rather const char8_t[SIZE]
(the type char8_t
was introduced in C++20).
您可以使用但是在C ++ 20中,新类型 std :: u8string
:
You can use however in C++20 the new type std::u8string
:
std::u8string s = u8"1 שלום Hello"; // good - std::u8string added in C++20
// std::cout << s << std::endl; // oops, std::ostream doesn't support u8string
一些有趣的注释:
- 直到C ++ 20 a
u8
带括号的字符串是const char [SIZE]
- 来自C ++ 20 一个
u8
带括号的字符串是const char8_t [SIZE]
- char8_t 的> 大小 与
char
相同,但这是一个独特的类型
- till C++20 a
u8
parenthesized string isconst char[SIZE]
- from C++20 a
u8
parenthesized string isconst char8_t[SIZE]
- the size of
char8_t
is the same aschar
, but it is a distinct type
故事
编码在C ++中是一个悲伤的故事。这可能就是为什么您的问题没有简单答案的原因。尚无用于处理字符编码的完善的端到端标准解决方案。有std转换器,第3方库等。但不是真正紧凑而简单的解决方案。希望C ++ 23可以解决这个问题。
The long story
Encoding is a sad story in C++. This is probably why there is no "simple answer" for your question. There isn't still a fully fledged end-to-end standard solution for handling character encoding. There are std converters, 3rd party libraries etc. But not a real tight and simple solution. Hopefully C++23 would solve this.
请参见 CppCon 2019主题会议,JeanHeyd Meneide
还有一个相关问题: std :: u8string与std :: string有何不同?
这篇关于C ++中的std :: string是否具有编码格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!