获取std :: ifstream来处理LF,CR和CRLF? [英] Getting std :: ifstream to handle LF, CR, and CRLF?

查看:567
本文介绍了获取std :: ifstream来处理LF,CR和CRLF?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

具体来说,我对 istream& getline(istream& is,string& str); 。有ifstream构造函数的选项,告诉它将所有换行编码转换为'\\\
'在引擎盖下?我想能够调用 getline ,并且能够正常处理所有行尾。

Specifically I'm interested in istream& getline ( istream& is, string& str );. Is there an option to the ifstream constructor to tell it to convert all newline encodings to '\n' under the hood? I want to be able to call getline and have it gracefully handle all line endings.

更新:要澄清,我想能够编写几乎在任何地方编译的代码,并且将从几乎任何地方接收输入。包括具有'\r'而没有'\\\
'的罕见文件。尽量减少软件用户的不便。

Update: To clarify, I want to be able to write code that compiles almost anywhere, and will take input from almost anywhere. Including the rare files that have '\r' without '\n'. Minimizing inconvenience for any users of the software.

这很容易解决这个问题,但我仍然很好奇,在正确的方式,在标准,灵活处理所有文本文件格式。

It's easy to workaround the issue, but I'm still curious as to the right way, in the standard, to flexibly handle all text file formats.

getline 读入一行,最多一个'\\\
'一个字符串。 '\\\
'从流中消耗,但getline不包括在字符串中。这到目前为止很好,但是在'\\\
'之前可能会有一个'\r'被包含到字符串中。

getline reads in a full line, up to a '\n', into a string. The '\n' is consumed from the stream, but getline doesn't include it in the string. That's fine so far, but there might be a '\r' just before the '\n' that gets included into the string.

在文本文件中看到的三种类型的行结尾:
'\\\
'是Unix机器上的常规结尾, '\r'(我认为)在旧的Mac操作系统上使用,Windows使用一对,'\r'后跟'\\\
'。

There are three types of line endings seen in text files: '\n' is the conventional ending on Unix machines, '\r' was (I think) used on old Mac operating systems, and Windows uses a pair, '\r' following by '\n'.

问题是 getline 在字符串结尾留下'\r'。

The problem is that getline leaves the '\r' on the end of the string.

ifstream f("a_text_file_of_unknown_origin");
string line;
getline(f, line);
if(!f.fail()) { // a non-empty line was read
   // BUT, there might be an '\r' at the end now.
}

编辑感谢Neil指出 f.good()不是我想要的。 !f.fail()是我想要的。

Edit Thanks to Neil for pointing out that f.good() isn't what I wanted. !f.fail() is what I want.

问题),这是很容易的Windows文本文件。但我担心有人会在一个只包含'\r'的文件中。在这种情况下,我认为getline会消耗整个文件,认为它是一行!

I can remove it manually myself (see edit of this question), which is easy for the Windows text files. But I'm worried that somebody will feed in a file containing only '\r'. In that case, I presume getline will consume the whole file, thinking that it is a single line!

..甚至不考虑Unicode: - )

.. and that's not even considering Unicode :-)

..可能Boost有一个很好的方法,从任何文本文件类型一次消耗一行。

.. maybe Boost has a nice way to consume one line at a time from any text-file type?

strong> Edit 我使用这个,来处理Windows文件,但我仍然觉得我不应该!这不会分叉为'\r'只有文件。

Edit I'm using this, to handle the Windows files, but I still feel I shouldn't have to! And this won't fork for the '\r'-only files.

if(!line.empty() && *line.rbegin() == '\r') {
    line.erase( line.length()-1, 1);
}


推荐答案

As Neil指出, C ++运行时应正确处理任何行结束约定是为您的特定平台。

As Neil pointed out, "the C++ runtime should deal correctly with whatever the line ending convention is for your particular platform."

但是,人们做移动文本文件在不同的平台之间,所以是不好的足够。这是一个处理所有三个行尾(\r,\\\
和\r\\\
)的函数:

However, people do move text files between different platforms, so that is not good enough. Here is a function that handles all three line endings ("\r", "\n" and "\r\n"):

std::istream& safeGetline(std::istream& is, std::string& t)
{
    t.clear();

    // The characters in the stream are read one-by-one using a std::streambuf.
    // That is faster than reading them one-by-one using the std::istream.
    // Code that uses streambuf this way must be guarded by a sentry object.
    // The sentry object performs various tasks,
    // such as thread synchronization and updating the stream state.

    std::istream::sentry se(is, true);
    std::streambuf* sb = is.rdbuf();

    for(;;) {
        int c = sb->sbumpc();
        switch (c) {
        case '\n':
            return is;
        case '\r':
            if(sb->sgetc() == '\n')
                sb->sbumpc();
            return is;
        case EOF:
            // Also handle the case when the last line has no line ending
            if(t.empty())
                is.setstate(std::ios::eofbit);
            return is;
        default:
            t += (char)c;
        }
    }
}

int main()
{
    std::string path = ...  // insert path to test file here

    std::ifstream ifs(path.c_str());
    if(!ifs) {
        std::cout << "Failed to open the file." << std::endl;
        return EXIT_FAILURE;
    }

    int n = 0;
    std::string t;
    while(!safeGetline(ifs, t).eof())
        ++n;
    std::cout << "The file contains " << n << " lines." << std::endl;
    return EXIT_SUCCESS;
}

这篇关于获取std :: ifstream来处理LF,CR和CRLF?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆