C ++ ifstream和“umlauts” [英] C++ ifstream and "umlauts"

查看:126
本文介绍了C ++ ifstream和“umlauts”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题umlauts(字母ä,ü,ö,...)和ifstream在C ++。

I am having an issue with "umlauts" (letters ä, ü, ö, ...) and ifstream in C++.

我使用curl下载html页面和ifstream一行一行地读取下载的文件并解析一些数据。直到我有一条像下面这样的线:

I use curl to download an html page and ifstream to read in the downloaded file line by line and parse some data out of it. This goes well until I have a line like one of the following:

te="Olimpija Laibach - Tromsö";
te="Burghausen - Münster";

我的代码解析这些行并将其输出如下:

My code parses these lines and outputs it as the following:

Olimpija Laibach vs. Troms?
Burghausen vs. M?nster

像直接从代码工作输出变音: / p>

Things like outputting umlauts directly from the code work:

cout << "öäü" << endl; // This works fine

我的代码看起来有点像这样:

My code looks somewhat like this:

ifstream fin("file");

while(!(fin.eof())) {
    getline(fin, line, '\n');
    int pos = line.find("te=");
    if(pos >= 0) {
         pos = line.find(" - ");
         string team1 = line.substr(4,pos-4);
         string team2 = line.substr(pos+3, line.length()-pos-6);
         cout << team1 << " vs. " << team2 << endl;
   }
}

编辑:事情是相同的代码(唯一改变的事情是源和分隔符)为另一个文本输入文件(相同的过程:下载与curl,读取ifstream)。解析和输出如下所示的行没有问题:

The weird thing is that the same code (the only changed things are the source and the delimiters) works for another text input file (same procedure: download with curl, read with ifstream). Parsing and outputting a line like the following is no problem:

<span id="...">Fernwärme Vienna</span>


推荐答案

fin ?在你显示的代码中,
是全局语言环境,如果你没有重置它,它是C

What's the locale embedded in fin? In the code you show, it would be the global locale, which if you haven't reset it, is "C".

如果你在盎格鲁 - 撒克逊世界之外的任何地方,以及字符串
,你显示你是—你在
main 中做的第一件事应该是

If you're anywhere outside the Anglo-Saxon world—and the strings you show suggest that you are— one of the first things you do in main should be

std::locale::global( std::locale( "" ) );

这将设置全局区域设置(因此,任何流的默认区域设置
稍后打开)到在周围环境中使用的区域设置。
(正式地,对于实现定义的本地环境,但在
实践中,无论用户使用什么。)在C编码
几乎总是ASCII; ASCII不识别Umlauts,根据
标准,输入中的非法编码应该替换为
实现定义的字符(IIRC—自从
以来已经有一段时间了我实际上已经重读了本节)。在输出中,当然,你不是
应该有任何未知字符,所以实现不
检查他们,并通过。

This sets the global locale (and thus the default locale for any streams opened later) to the locale being using in the surrounding environment. (Formally, to an implementation defined native environment, but in practice, to whatever the user is using.) In "C" locale, the encoding is almost always ASCII; ASCII doesn't recognize Umlauts, and according to the standard, illegal encodings in input should be replaces with an implementation defined character (IIRC—it's been some time since I've actually reread this section). In output, of course, you're not supposed to have any unknown characters, so the implementation doesn't check for them, and the go through.

由于 std :: cin 等在您有机会设置
全局语言环境之前已打开,您必须填充它们与 std :: locale()具体。

Since std::cin, etc. are opened before you have a chance to set the global locale, you'll have to imbue them with std::locale( "" ) specifically.

如果这不起作用,找到一些特定的区域设置
使用。

If this doesn't work, you might have to find some specific locale to use.

这篇关于C ++ ifstream和“umlauts”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆