流上的C ++正则表达式 [英] C++ regular expression over a stream

查看:83
本文介绍了流上的C ++正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的文本文件(最大几百MB),我想使用STL正则表达式进行处理.我正在寻找的匹配区域跨越几行,并且在文件中至少发生了数千次.

I have a very large text file (up to a few hundred MB) that I would like to process with STL regular expression. The matching region I am looking for spans several lines and happens at least a few thousand times in the file.

我可以为此目的使用流迭代器吗?我已经尝试过std :: istream_iterator,但是没有运气.可以张贴一个最小的工作示例吗?

Can I use stream iterators for that purpose? I've tried std::istream_iterator, but no luck. Could one post a minimal working example?

请注意,我正在寻找仅涉及STL的解决方案.在完美的解决方案中,我想遍历所有比赛.

Note, that I am looking for a solution involving only STL. In the perfect solution I would like to iterate over all matches.

编辑

阅读完评论后,我知道这是不可能的.因此,也许还有另一种方法可以遍历要在大型文本文件中找到的正则表达式匹配项:

Once I've read the comment, I understand this is not possible. So maybe there is another way to iterate over regex matches to be found in a large text file:

#include <regex>
#include <iostream>
#include <string>

const std::string s = R"(Quick brown fox
jumps over
several lines)"; // At least 200MB of multiline text here

int main(int argc,char* argv[]) {

    std::regex find_jumping_fox("(Quick(?:.|\\n)+?jump\\S*?)");
    auto it = std::sregex_iterator(s.begin(), s.end(),        find_jumping_fox);

    for (std::sregex_iterator i = it; i != std::sregex_iterator(); ++i) {
        std::smatch match = *i;                                                 
        std::string match_str = match.str(); 
        std::cout << match_str << '\n';
    }  
}

推荐答案

您无法在流上进行匹配,导致匹配失败是什么意思?正则表达式的开头是否已匹配,需要输入更多字符,或者流中的任何部分都不匹配.

You can't match on a stream, cause what would a failed match mean? Has the start of the regex matched and more characters need to be streamed in, or has no part of the stream matched.

但是在您进行编辑之后,我们可以找到字符串的偏移量和匹配范围.您将要使用:

But after your edit, we can find offsets and ranges of matches on a string. You'll want to use:

const vector<smatch> foo = { sregex_iterator(cbegin(s), cend(s), find_jumping_fox), sregex_iterator() }

在此处进行了详细说明: https://topanswers.xyz/cplusplus?q= 729#a845

It's explained in complete detail here: https://topanswers.xyz/cplusplus?q=729#a845

这篇关于流上的C ++正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆