如何在C ++ 17中读取UTF-16文本文件 [英] How to read a UTF-16 text file in C++17

查看:107
本文介绍了如何在C ++ 17中读取UTF-16文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对C ++很陌生.我想在Visual Studio 2019中的C ++ 17中读取UTF-16文本文件.

I am very new to C++. I want to read a UTF-16 text file in C++17 in Visual Studio 2019.

我已经在Internet上尝试了几种方法(包括StackOverflow),但没有一种起作用,并且其中一些没有编译(我认为它们仅支持较旧的编译器).

I have tried several methods in the internet (including StackOverflow) but none of them worked, and some of them didn't compile (I think they only support older compilers).

我正试图在不使用任何第三方库的情况下实现这一目标.

这会读取一个文本文件,但是每个字母之间都有一些奇怪的字符和空格.

This reads a text file, but it has some weird characters and spaces between each letter.

// open file for reading
std::wifstream istrm(filename, std::ios::binary);
if (!istrm.is_open()) {
    std::cout << "failed to open " << filename << '\n';
}
else {
    std::wstring s;
    std::getline(istrm, s);
    std::wcout << s << std::endl;
}

然后我使用以下库找到了一些解决方案

Then I found some solutions for this using the following libraries

#include <locale>
#include <codecvt>

// open file for reading
std::wifstream istrm(filename, std::ios::binary);
istrm.imbue(std::locale(istrm.getloc(), new std::codecvt_utf16<wchar_t, 0x10ffff, std::consume_header>));
if (!istrm.is_open()) {
    std::cout << "failed to open " << filename << '\n';
}
else {
    std::wstring s;
    std::getline(istrm, s);
    std::wcout << s << std::endl;
}

这次甚至没有编译,在 std :: codecvt_utf16 行出现了以下错误:

This time it didn't even compile, got the following errors at the std::codecvt_utf16 line:

错误C4996'std :: codecvt_utf16':警告STL4017:std :: wbuffer_convert,std :: wstring_convert和标头(包含std :: codecvt_mode,std :: codecvt_utf8,std :: codecvt_utf16和std :: code8)在C ++ 17中已弃用.(不建议使用std :: codecvt类模板.)C ++标准不提供等效的不建议使用的功能;请参见C ++ Standard.考虑改用MultiByteToWideChar()和WideCharToMultiByte().您可以定义_SILENCE_CXX17_CODECVT_HEADER_DEPRECATION_WARNING或_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS来确认您已收到此警告.

Error C4996 'std::codecvt_utf16': warning STL4017: std::wbuffer_convert, std::wstring_convert, and the header (containing std::codecvt_mode, std::codecvt_utf8, std::codecvt_utf16, and std::codecvt_utf8_utf16) are deprecated in C++17. (The std::codecvt class template is NOT deprecated.) The C++ Standard doesn't provide equivalent non-deprecated functionality; consider using MultiByteToWideChar() and WideCharToMultiByte() from instead. You can define _SILENCE_CXX17_CODECVT_HEADER_DEPRECATION_WARNING or _SILENCE_ALL_CXX17_DEPRECATION_WARNINGS to acknowledge that you have received this warning.

如果有人可以为此提供解决方案,我将不胜感激.

I would appreciate if someone can provide a solution for this.

谢谢.

推荐答案

首先,阅读相关问题,例如 16位wchar_t对表示完整Unicode的形式正式有效吗?.

First of all, read related questions like Does std::wstring support UTF-16 and UTF-32 on Windows? and Is 16-bit wchar_t formally valid for representing full Unicode?.

如果您想要的只是简单地将字符串读/写为blob,而您已经知道其编码为UTF-16,而不执行任何转换或操作,并且您处于Windows上的类似Visual Studio 2019的环境中,则 wchar_t 用于保存UTF-16 ,则可以使用C ++宽字符串和流.

If what you want is simply read/write strings as a blob for which you already know the encoding is UTF-16, without performing any conversion or manipulation, and you are in an environment like Visual Studio 2019 on Windows for which wchar_t is intended to hold UTF-16, then you can use the C++ wide strings and streams.

现在,如果您需要执行转换,支持多种编码,在字符串中进行迭代(对于迭代的某些定义),或者通常来说是不重要的事情,那么如果您想留在C语言中,那您现在就不走运了.++ 17.C ++标准委员会已经为Unicode建立了一个工作组,因此希望在未来几年中在这方面看到一些一些改进.目前,您将需要使用Win32函数,例如 MultiByteToWideChar WideCharToMultiByte ,或使用外部库,例如Unicode国际组件(ICU)或Boost的语言环境.

Now, if you need to perform conversions, support several encodings, iterate within strings (for some definitions of iterate), or in general anything non-trivial, you are out of luck at the moment if you want to stay within C++17. The C++ Standard committee has established a working group for Unicode, so expect to see some improvements in this area in the upcoming years. For the moment, you will need to use either Win32 functions like MultiByteToWideChar and WideCharToMultiByte, or an external library like International Components for Unicode (ICU) or Boost's Locale.

这篇关于如何在C ++ 17中读取UTF-16文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆