规则从文本文档中提取键+短语 [英] rule to extract key+phrases from a text document

查看:237
本文介绍了规则从文本文档中提取键+短语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从文档中提取关键短语:某些KEY短语END something ... ect。我的规则运行良好,但结果不包含键名称。为了得到一个字符串,应该是什么规则:KEY phrase。谢谢您的建议。

I want to extract the key phrases from the document: "something KEY phrase END something ... ect". My rule works well but the result does not contain of key name. What should be the rule in order to get a string: "KEY phrase". Thank you for the advice.

std::vector<std::string> doc; 
bool r = qi::phrase_parse(first,last, 
  ( qi::omit[*(qi::char_-"KEY")] 
    >> qi::lexeme[ "KEY"
    >> *(qi::char_-"KEY" -"END")] ) % "END"
, qi::space, doc);


推荐答案

qi :: lit ...)不合成属性。

qi :: string(...) code>。

qi::string(...) does.

使用qi :: string(KEY)替换KEY可能。 (很难不知道 doc 的类型)

Replace "KEY" with qi::string("KEY"), likely. (hard to tell without knowing the type of doc)

bool r = qi::phrase_parse(first,last, 
  ( qi::omit[*(qi::char_-"KEY")] 
    >> qi::lexeme[ qi::string("KEY")
    >> *(qi::char_-"KEY" -"END")] ) % "END"
, qi::space, doc);

奖金另请参阅 寻求[] 解析器指令 <从圣灵库/ STRONG>

BONUS See also seek[] parser directive from the Spirit Repository:

求[] 解析器指令跳过所有输入,直至主题解析器的比赛。

The seek[] parser-directive skips all input until the subject parser matches.

下面就是我想要做的:

Live在Coliru

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;

extern std::string const sample; // below

int main() {
    auto f(sample.begin()), l(sample.end());

    std::vector<std::string> phrases;

    if (qi::parse(f,l, *qi::as_string[
                qr::seek[qi::string("KEY")] >> *(qi::char_ - "END")
            ], phrases)) 
    {
        for (size_t i = 0; i < phrases.size(); ++i) 
            std::cout << "keyphrase #" << i << ": '" << phrases[i] << "'\n";
    }
}

列印:

keyphrase #0: 'KEY@v/0qwJTjgFQwNmose7LiEmAmKpIdK3TPmkCs@'
keyphrase #1: 'KEY@G1TErN1QSSKi17BSnwBKML@'
keyphrase #2: 'KEY@pWhBKmc0sD+o@'
keyphrase #3: 'KEY@pwgjNJ0FvWGRezwi74QdIQdmUuKVyquWuvXz4tBOXqMMqco@'
keyphrase #4: 'KEY@aJ3QUfLh3AqfKyxcUSiDbanZmCNGza6jb6pZ@'
keyphrase #5: 'KEY@bYJzitZUyXlgPA009qBpleHIJ9uJUSdJO78iisUgHkoqUpf+oXZQF9X/7v2fikgemCD@'

此答案中的注释中包含示例数据:/ here /

Sample data included in a comment in this answer: /here/

这篇关于规则从文本文档中提取键+短语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆