如何将UTF-8文本解码为可读字符串C ++ [英] How to decode UTF-8 text to readable char array C ++
问题描述
嗨!我有一个文件包含(英文和西里尔文字):
Hi! I have a file that contains (english and cyrillic words):
\u0074\u0065\u0078\u0074\u0442\u0435\u043a\u0441\u0442
使用ifstream和read()方法将文件内容复制到char数组。
< b>我尝试了什么:
Using ifstream and read() method copy file contents to char array.
What I have tried:
std::ifstream file("d:/example.txt", std::ios::in | std::ios::binary);
char buffer[128] = "";
file.seekg(0, ios::end);
int data_len = (int)file.tellg();
file.seekg(0, ios::beg);
file.read(buffer, data_len);
当输出缓冲区到MessageBox时,它将按原样显示 - 未解码。
如何解码包含英文和西里尔文单词到char数组的文本?
And when output buffer to MessageBox, then it will be displayed as is - not decoded.
How to decode text, that contains english and cyrillic words to char array?
推荐答案
虽然,我还没有尝试过,但我相信您需要将数据读取从普通字节(char)更改为宽字节(wchar_t)。读取数据时会丢失数据,因为您可能会逐字节读取数据,而Unicode在这种情况下是臭名昭着的。
Although, I have not yet tried this but I believe you would need to change the data reading from ordinary bytes (char) to wide bytes (wchar_t). Your data is lost when you read it, because you might be reading it byte-by-byte, and Unicode is notorious in this case.
// I just shamelessly copied this code from https://stackoverflow.com/a/901617/1762944
ifstream file;
file.open("k:/test.txt", ifstream::in|ifstream::binary);
wchar_t buffer[2048];
file.seekg(2);
file.read((char*)buffer, line_length);
wprintf(L"%s\n", buffer);
file.close();
请参阅此处, visual c ++ - Read Unicode文件C ++ - 堆栈溢出 [ ^ ]
宽字符 - 维基百科 [ ^ ]
编码概述 - 全球化| Microsoft Docs [ ^ ]
See here, visual c++ - Read Unicode files C++ - Stack Overflow[^]
Wide character - Wikipedia[^]
Encoding Overview - Globalization | Microsoft Docs[^]
使用
use
std::wfstream
而不是
instead
参见使用C / C ++处理简单的文本文件 [ ^ ]用于将多字节(UTF8)转换为Unicode。
See Handling simple text files in C/C++[^] for converting multi-byte (UTF8) to Unicode.
这篇关于如何将UTF-8文本解码为可读字符串C ++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!