在C ++ 11中读/写/打印UTF-8 [英] Reading/writing/printing UTF-8 in C++11

查看:372
本文介绍了在C ++ 11中读/写/打印UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在探索C ++ 11的新Unicode功能,而其他C ++ 11编码问题非常有帮助,我有一个关于以下代码片段的问题,来自 cppreference 。代码写入,然后立即读取以UTF-8编码保存的文本文件。

I have been exploring C++11's new Unicode functionality, and while other C++11 encoding questions have been very helpful, I have a question about the following code snippet from cppreference. The code writes and then immediately reads a text file saved with UTF-8 encoding.

// Write
std::ofstream("text.txt") << u8"z\u6c34\U0001d10b";

// Read
std::wifstream file1("text.txt");
file1.imbue(std::locale("en_US.UTF8"));
std::cout << "Normal read from file (using default UTF-8/UTF-32 codecvt)\n";
for(wchar_t c; file1 >> c; ) // ?
   std::cout << std::hex << std::showbase << c << '\n';

我的问题很简单,为什么是 wchar_t 需要在循环?可以使用简单的 char * 声明一个 u8 字符串文字,UTF-8编码的位布局应该系统的字符宽度。它似乎有一些自动转换从UTF-8到UTF-32(因此 wchar_t ),但如果是这样,为什么是必要的转换?

My question is quite simply, why is a wchar_t needed in the for loop? A u8 string literal can be declared using a simple char * and the bit layout of the UTF-8 encoding should tell the system the character's width. It appears there is some automatic conversion from UTF-8 to UTF-32 (hence the wchar_t), but if this is the case, why is the conversion necessary?

推荐答案

您使用 wchar_t ,因为您正在使用 wifstream ;如果你使用 ifstream 读取,你将使用 char ,类似的 char16_t char32_t

You use wchar_t because you're reading the file using wifstream; if you were reading using ifstream you'd use char, and similarly for char16_t and char32_t.

假设 wchar_t 是32位,并且它表示的本机字符集是UTF-32(UCS-4),那么这是读取文件的最简单的方法UTF-32;它在示例中呈现为与以UTF-16读取文件相反。更容易的方法是显式地使用 basic_ifstream< char32_t> std :: codecvt_utf8< char32_t> 保证从UTF-8输入流转换为UTF-32元素。

Assuming (as the example does) that wchar_t is 32-bit, and that the native character set that it represents is UTF-32 (UCS-4), then this is the simplest way to read a file as UTF-32; it is presented as such in the example for contrast to reading a file as UTF-16. A more portable method would be to use basic_ifstream<char32_t> and std::codecvt_utf8<char32_t> explicitly, as this is guaranteed to convert from a UTF-8 input stream to UTF-32 elements.

这篇关于在C ++ 11中读/写/打印UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆