使用perl提取特定的输出行 [英] use perl to extract specific output lines

查看:489
本文介绍了使用perl提取特定的输出行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力创建一个系统,以根据输入文本概括规则.我正在使用 reVerb 创建我的初始规则集.例如,使用以下命令[*]: $ echo "Bananas are an excellent source of potassium." | ./reverb -q | tr '\t' '\n' | cat -n

I'm endeavoring to create a system to generalize rules from input text. I'm using reVerb to create my initial set of rules. Using the following command[*], for instance: $ echo "Bananas are an excellent source of potassium." | ./reverb -q | tr '\t' '\n' | cat -n

要生成以下形式的输出:

To generate output of the form:

    1  stdin
    2  1
    3  Bananas
    4  are an excellent source of
    5  potassium
    6  0
    7  1
    8  1
    9  6
   10  6
   11  7
   12  0.9999999997341693
   13  Bananas are an excellent source of potassium .
   14  NNS VBP DT JJ NN IN NN .
   15  B-NP B-VP B-NP I-NP I-NP I-NP I-NP O
   16  bananas
   17  be source of
   18  potassium

我目前正在将输出传递到一个文件,该文件包括前面的空格和数字,如上所述.

I'm currently piping the output to a file, which includes the preceding white space and numbers as depicted above.

我真正追求的只是最后的简单规则,即第16、17和第16行. 18.我一直在尝试创建一个脚本来提取该组件,并以Prolog子句的形式将其放入新文件,即be source of(banans, potassium).

What I'm really after is just the simple rule at the end, i.e. lines 16, 17 & 18. I've been trying to create a script to extract just that component and put it to a new file in the form of a Prolog clause, i.e. be source of(banans, potassium).

那可行吗? Prolog规则可以包含这样的空格吗?

Is that feasible? Can Prolog rules contain white space like that?

我认为我无法从reVerb获取所有输出,因此,提取所需组件的最佳方法是什么?使用Perl脚本?或者也许是sed?

I think I'm locked into getting all that output from reVerb so, what would be the best way to extract the desirable component? With a Perl script? Or maybe sed?

*稍后,我计划将其替换为较大的输入文件,而不是仅使用单个句子.

*Later I plan to replace this with a larger input file as opposed to just single sentences.

推荐答案

这似乎很浪费.为什么不按原样保留这些标签,并使用:

This seems wasteful. Why not leave the tabs as they are, and use:

$ echo "Bananas are an excellent source of potassium." \
  | ./reverb -q | cut --fields=16,17,18

是的,您可以在Prolog中有这样的规则.请参见通过@mat回答.我想您需要先了解一些Prolog.

And yes, you can have rules like this in Prolog. See the answer by @mat. You need to know a bit of Prolog before you move on, I guess.

但是,仅将字符串作为谓词的有效名称会更容易:

It is easier, however, to just make the string a a valid name for a predicate:

  • be_source_of带有下划线而不是空格
  • 'be source of'带空格,并用单引号引起来.
  • be_source_of with underscores instead of spaces
  • or 'be source of' with spaces, and enclosed in single quotes.

您可以使用awk来完成三个字段的操作.例如,请参见awk中的printf命令.或者,您可以直接从Prolog中再次解析它.我认为这两者都超出了您当前问题的范围.

You can use probably awk to do what you want with the three fields. See for example the printf command in awk. Or, you can parse it again from Prolog directly. Both are beyond the scope of your current question, I feel.

这篇关于使用perl提取特定的输出行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆