C ++中的std :: string是否具有编码格式 [英] Does std::string in c++ has encoding format

查看:314
本文介绍了C ++中的std :: string是否具有编码格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想找到有关std :: string的默认编码格式。

我试图找出编码格式,但是我不知道。
c ++中的std ::字符串是否具有编码格式?

I want to find the default encoding format about std:: string.
I am trying to find out the encoding format, but I have no idea. Does std:: string in c++ has encoding format ?

推荐答案

简单答案



std :: string 被定义为 std :: basic_string< char> 它是字符的集合。作为char的集合,它可以容纳作为 utf8 字符串编码结果的char。

The simple answer

std::string is defined as std::basic_string<char> which means it is a collection of chars. As a collection of chars it can potentially hold chars that are the encoded result of a utf8 string.

以下代码有效直到C ++ 20

std::string s = u8"1 שלום Hello";
std::cout << s << std::endl;

然后它会在支持它的控制台中打印


1你好

带括号的字符串之前的 u8 href = https://en.cppreference.com/w/cpp/language/string_literal rel = nofollow noreferrer>字符串文字,用于 utf8 告诉编译器认为以下带括号的字符串具有utf8编码。

The u8 before the parenthesized string is the string literal for utf8 telling the compiler that the following parenthesized string has utf8 encoding.

如果没有 u8 前缀表示法,编译器将根据编译器的源编码获取字符串。默认编码或为编译器明确设置的编码支持字符串中的字符,也可以像这样:

Without the u8 prefix notation the compiler would take the string based on the source encoding of the compiler, so if the default encoding or the encoding explicitly set for the compiler supports the chars in the string it can take it also like this:

std::string s = "1 שלום Hello";
std::cout << s << std::endl;

与上述相同的输出

如果编译器的源编码不支持这些字符,例如,如果我们在gcc中将源编码设置为 LATIN 带有标志 -fexec-charset = ISO-8859-1 的字符串,不含 u8 前缀出现以下编译错误

If the source encoding of the compiler doesn't support these chars, for example if we are setting in gcc the source encoding to LATIN with the flag -fexec-charset=ISO-8859-1 the string without u8 prefix gives the following compilation error:

converting to execution character set:
Invalid or incomplete multibyte or wide character 
    std::string s = "1 שלום Hello";
                     ^~~~~~~~~~~~~~

由于C ++ 20 u8 带括号的字符串不能转换为 std :: string

Since C++20 u8 parenthesized string cannot be converted into std::string:

std::string s = u8"1 שלום Hello";
std::cout << s << std::endl;

在C ++ 20中给出以下编译错误

conversion from 'const char8_t [17]' to non-scalar type 'std::string'
{aka 'std::__cxx11::basic_string<char>'} requested
    std::string s = u8"1 שלום Hello";
                    ^~~~~~~~~~~~~~~~~

这是因为C ++ 20中 u8 带括号的字符串的类型不是 const char [SIZE] 而是 const char8_t [SIZE] (在C ++ 20中引入了 char8_t 类型)。

This is because the type of u8 parenthesized string in C++20 is not const char[SIZE] but rather const char8_t[SIZE] (the type char8_t was introduced in C++20).

可以使用但是在C ++ 20中,新类型 std :: u8string

You can use however in C++20 the new type std::u8string:

std::u8string s = u8"1 שלום Hello"; // good - std::u8string added in C++20
// std::cout << s << std::endl; // oops, std::ostream doesn't support u8string

一些有趣的注释:


  1. 直到C ++ 20 a u8 带括号的字符串是 const char [SIZE]

  2. 来自C ++ 20 一个 u8 带括号的字符串是 const char8_t [SIZE]

  3. char8_t 的> 大小 char 相同,但这是一个独特的类型

  1. till C++20 a u8 parenthesized string is const char[SIZE]
  2. from C++20 a u8 parenthesized string is const char8_t[SIZE]
  3. the size of char8_t is the same as char, but it is a distinct type






故事



编码在C ++中是一个悲伤的故事。这可能就是为什么您的问题没有简单答案的原因。尚无用于处理字符编码的完善的端到端标准解决方案。有std转换器,第3方库等。但不是真正紧凑而简单的解决方案。希望C ++ 23可以解决这个问题。


The long story

Encoding is a sad story in C++. This is probably why there is no "simple answer" for your question. There isn't still a fully fledged end-to-end standard solution for handling character encoding. There are std converters, 3rd party libraries etc. But not a real tight and simple solution. Hopefully C++23 would solve this.

请参见 CppCon 2019主题会议,JeanHeyd Meneide

还有一个相关问题: std :: u8string与std :: string有何不同?

这篇关于C ++中的std :: string是否具有编码格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆