使用std :: wifstream读取带有特殊字符的unicode文件 [英] Read unicode file with special characters using std::wifstream
问题描述
在Linux环境中,我有一段代码用于读取unicode文件,如下所示.
In a Linux environment, I have a piece of code for reading unicode files, similar as shown below.
但是,特殊字符(如丹麦字母æ,ø和å)的处理不正确.对于abcæøåabc"行,则输出仅为"abc".使用调试器,我可以看到wline
的内容也只是a\000b\000c\000
.
However, special characters (like danish letters æ, ø and å) are not handled correctly. For the line 'abcæøåabc' then output is simply 'abc'. Using a debugger I can see that the contents of wline
is also only a\000b\000c\000
.
#include <fstream>
#include <string>
std::wifstream wif("myfile.txt");
if (wif.is_open())
{
//set proper position compared to byteorder
wif.seekg(2, std::ios::beg);
std::wstring wline;
while (wif.good())
{
std::getline(wif, wline);
if (!wif.eof())
{
std::wstring convert;
for (auto c : wline)
{
if (c != '\0')
convert += c;
}
}
}
}
wif.close();
有人可以告诉我如何阅读整行吗?
Can anyone tell me how I get it to read the whole line?
感谢和问候
推荐答案
You have to use the imbue()
method to tell wifstream
that the file is encoded as UTF-16, and let it consume the BOM for you. You do not have to seekg()
past the BOM manually. For example:
#include <fstream>
#include <string>
#include <locale>
#include <codecvt>
// open as a byte stream
std::wifstream wif("myfile.txt", std::ios::binary);
if (wif.is_open())
{
// apply BOM-sensitive UTF-16 facet
wif.imbue(std::locale(wif.getloc(), new std::codecvt_utf16<wchar_t, 0x10ffff, std::consume_header>));
std::wstring wline;
while (std::getline(wif, wline))
{
std::wstring convert;
for (auto c : wline)
{
if (c != L'\0')
convert += c;
}
}
wif.close();
}
这篇关于使用std :: wifstream读取带有特殊字符的unicode文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!