创建匹配的括号-awk:sed [英] Creating matching brackets- awk :sed

查看:115
本文介绍了创建匹配的括号-awk:sed的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有三种模式的数据集:

I have a data set that has three patterns:

第一:

abrasion abrade:stem<>ion:suffix
abstainer abstain:stem<>er:suffix
abstention abstain:stem<>ion:suffix

第二:

inaccurate in:prefix<>accurate:stem
inactive in:prefix<>active:stem

第三:

incommunicable in:prefix<>communicate:stem<>able:suffix
incompatibility in:prefix<>compatible:stem<>ity:suffix

我需要将以上形式转换为以下形式:匹配宾夕法尼亚州树银行的方括号(

I need to convert the above to following form : Matching the brackets in the way for Penn Tree Bank (http://languagelog.ldc.upenn.edu/myl/PennTreebank1995.pdf)

第一:

abrasion ((abrade:stem) ion:suffix)
abstainer ((abstain:stem)er:suffix)
abstention ((abstain:stem)ion:suffix)

第二:

inaccurate (in:prefix(accurate:stem))
inactive (in:prefix(active:stem))

第三:

incommunicable (in:prefix ((communicate:stem)able:suffix))
incompatibility (in:prefix ((compatible:stem)ity:suffix))

我正在工作的代码正在使用awk

The code, I am working is using awk

{
    n = gsub(/<>/,")",$2)
    s = sprintf("%*s",n,"")
    gsub(/ /,"(",s)
    print "(" $1, s "((" $2 "))"
}

编辑

更复杂的表格

nationalistic national: stem <>ism:suffix<>ist:suffix<>ic:suffix 

收件人:

nationalistic ((((national: stem) ism:suffix)ist:suffix)ic:suffix)

没有产生示例中提到的预期输出.

It is not producing the expected outputs that mentioned in the examples.

推荐答案

这应该足够通用,因为它考虑了:stem:prefix:suffix进行匹配:

This should be general enough as it takes into account :stem, :prefix, and :suffix for matching:

awk 'BEGIN{FS=OFS="\n"}{
  a=gensub(/([a-zA-Z]*):stem/,"(\\1:stem)", "g");
  b=gensub(/(\([a-zA-Z]*:stem\))<>([a-zA-Z]*):suffix/,"(\\1\\2:suffix)", "g", a);
  c=gensub(/([a-zA-Z]*:prefix)<>(.*)/,"(\\1\\2)", "g", b);
  print c;}' testfile

此处演示: https://ideone.com/U3ux91

编辑

这应该照顾多个后缀和前缀:

This should take care of multiple suffixes and prefixes:

awk 'BEGIN{FS=OFS="\n"}{
   a=gensub(/([a-zA-Z]*):stem/,"(\\1:stem)", "g");
   while ( a ~ /stem)<>.*:suffix/) {
     a=gensub(/(\([a-zA-Z]*:stem\).*?)<>([a-zA-Z]*):suffix/,"(\\1\\2:suffix)", "g", a);
   }
   while ( a ~ /<>/) {
     a=gensub(/([a-zA-Z]*?:prefix)<>(.*)/,"(\\1\\2)", "g", a);
   }
   print a;}' test

此处演示: https://ideone.com/U7LYXi (很抱歉,如果不是反民族主义,而是为了测试……)

Demo here: https://ideone.com/U7LYXi (sorry if antinationalistic is not a word, but for testing sake....)

这篇关于创建匹配的括号-awk:sed的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆