从文件读取时无法识别字符 [英] Characters not recognized while reading from file

查看:92
本文介绍了从文件读取时无法识别字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Visual Studio中,我具有以下c ++代码来读取文件中的字符.

I have the following c++ code in visual studio to read characters from a file.

    ifstream infile;
    infile.open(argv[1]);

    if (infile.fail()) {
        cout << "Error reading from file: " << strerror(errno) << endl;
        cout << argv[0] << endl;
    }
    else {
        char currentChar;

        while (infile.get(currentChar)) {
            cout << currentChar << " " << int(currentChar) << endl;
            //... do something with currentChar
        }

        ofstream outfile("output.txt");
        outfile << /* output some text based on currentChar */;
    }
    infile.close();

在这种情况下,文件应包含大多数普通的ASCII字符,但以下两个除外:""和"" .

The file in this case is expected to contain mostly normal ASCII characters, with the exception of two: " and ".

问题在于当前格式的代码无法识别这些字符.对该字符进行 cout 输出会产生垃圾,其int转换会产生一个负数,该负数根据出现在文件中的位置而有所不同.

The problem is that the code in it's current form is not able to recognise those characters. couting the character outputs garbage, and its int conversion yields a negative number that's different depending on where in the file it occurs.

我很直觉这个问题是编码问题,所以我尝试根据互联网上的一些例子来灌输 infile ,但是我似乎并没有解决问题. infile.get 到达引号字符时失败,或者问题仍然存在.我缺少什么细节?

I have a hunch that the problem is encoding, so I've tried to imbue infile based on some examples on the internet, but I haven't seemed to get it right. infile.get either fails when reaching the quote character, or the problem remains. What details am I missing?

推荐答案

您尝试读取的文件可能是UTF-8编码的.大多数字符都可以正常读取的原因是因为UTF-8向后兼容ASCII.

The file you are trying to read is likely UTF-8 encoded. The reason most characters read fine is because UTF-8 is backwards compatible with ASCII.

为了读取UTF-8文件,我将向您介绍以下内容:http://en.cppreference.com/w/cpp/locale/codecvt_utf8

In order to read a UTF-8 file I'll refer you to this: http://en.cppreference.com/w/cpp/locale/codecvt_utf8

#include <fstream>
#include <iostream>
#include <string>
#include <locale>
#include <codecvt>
...

// Write file in UTF-8
std::wofstream wof;
wof.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t,0x10ffff,std::generate_header>));
wof.open(L"file.txt");
wof << L"This is a test.";
wof << L"This is another test.";
wof << L"\nThis is the final test.\n";
wof.close();

// Read file in UTF-8
std::wifstream wif(L"file.txt");
wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t,0x10ffff, std::consume_header>));

std::wstringstream wss;
wss << wif.rdbuf();

(来自 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆