SED:持有模式和重新排列线 [英] sed: hold pattern and rearrange line

查看:107
本文介绍了SED:持有模式和重新排列线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不知道如果我能与SED纯粹做到这一点:

I am not sure if I can do this purely with sed:

我试图重新安排这样的行

I am trying to rearrange lines like this

GF:001,GF:00012,GF:01223<TAB>XXR
GF:001,GF:00012,GF:01223,GF:0666<TAB>XXXR3

GF:001<TAB>XXR
GF:00012<TAB>XXR
GF:01223<TAB>XXR
GF:001<TAB>XXXR3
GF:00012<TAB>XXXR3
GF:01223<TAB>XXXR3
GF:0666<TAB>XXXR3

任何人任何提示? GF的基数:XXXX是交替作为GF长度:XXXX是

Anyone any hints? The cardinality of GF:XXXX is alternating as the length of GF:XXXX is.

我坚持 SED -n
/\\(XX.*\\)$/'{
S /,/ \\ t \\ 1 \\ N /
}'输入
但我不能参照原匹配的模式摆在首位。有任何想法吗?干杯!

I am stuck with sed -n ' '/\(XX.*\)$/' { s/,/\t\1\n/ }' input but I cannot reference to the originally matched pattern in the first place. any ideas? cheers!

更新:
我认为这是不可能的,只是用sed来做到这一点。所以我用Perl来做到这一点:

Update: I think it is not possible to do this with just using sed. So I used perl to do this:

perl -e 'open(IN, "< file");
while (<IN>) {
    @a = split(/\t/);
    @gos = split(/,/, $a[0]);
    foreach (@gos) {
      print $_."\t".$a[1];
    }
close( IN );' > output

但是,如果有人知道的方式来解决这个刚刚与 SED 请在这里发表吧...

But if anyone knows a way to solve this just with sed please post it here...

推荐答案

可以在 SED 做,虽然我可能会使用Perl(或awk的或Python),以做到这一点。

It can be done in sed, though I probably would use Perl (or Awk or Python) to do it.

我没要求的优雅此解决方案,但蛮力和无知有时不负有心人。我创建了一个名为文件,unoriginally, sed.script 包含:

I claim no elegance for this solution, but brute force and ignorance sometimes pays off. I created a file called, unoriginally, sed.script containing:

/\(GF:[0-9]*\),\(.*\)<TAB>\(.*\)/{
:redo
s/\(GF:[0-9]*\),\(.*\)<TAB>\(.*\)/\1<TAB>\3@@@@@\2<TAB>\3/
h
s/@@@@@.*//
p
x
s/.*@@@@@//
t redo
d
}

我跑了它作为:

sed -f sed.script input

其中,输入包含在问题中所示的两行。它产生的输出:

where input contained the two lines shown in the question. It produced the output:

GF:001<TAB>XXR
GF:00012<TAB>XXR
GF:01223<TAB>XXR
GF:001<TAB>XXXR3
GF:00012<TAB>XXXR3
GF:01223<TAB>XXXR3
GF:0666<TAB>XXXR3

(我带的自由故意misinter preting &LT;标签&gt; 是一个5个字符的序列,而不是一个制表符,你可以轻松地修复答案来处理实际的标签字符,而不是。)

(I took the liberty of deliberately misinterpreting <TAB> to be a 5-character sequence instead of a single tab character; you can easily fix the answer to handle an actual tab character instead.)

SED 脚本的说明:


  • 找到与 GF中出现了多次线:NNN 用逗号隔开(我们不需要处理包含单出现这样的线)。仅做这样行脚本的其余部分。其它则穿过(印刷)不变。

  • 创建一个标签,所以我们可以分支回到它

  • 拆分行成3部分记忆。第一部分是初始GF信息;第二部分是任何其他GF信息;第三部分是&LT后场;标签&gt; 。与第一场替换此,&LT;标签&gt; ,第三个字段,令人难以置信的标志图案( @@@@@ ),第二个字段,&LT;标签&gt; ,第三领域

  • 修改后的行复制到保留空间。

  • 删除标记图案到底。
  • 打印。

  • 交换保留空间到模式空间。

  • 删除一切直至并包括该标记模式。

  • 如果我们做任何工作,回到重做标签。

  • 删除剩下的东西(它已经打印)。

  • 脚本块的结束。

  • Find lines with more than one occurrence of GF:nnn separated by commas (we do not need to process lines that contain a single such occurrence). Do the rest of the script only on such lines. Anything else is passed through (printed) unchanged.
  • Create a label so we can branch back to it
  • Split the line into 3 remembered parts. The first part is the initial GF information; the second part is any other GF information; the third part is the field after the <TAB>. Replace this with the first field, <TAB>, third field, implausible marker pattern (@@@@@), second field, <TAB>, third field.
  • Copy the modified line to the hold space.
  • Delete the marker pattern to the end.
  • Print.
  • Swap the hold space into the pattern space.
  • Remove everything up to and including the marker pattern.
  • If we've done any work, go back to the redo label.
  • Delete what's left (it was printed already).
  • End of script block.

这是一个简单的循环,在每次迭代被一个降低模式的数目

This is a simple loop that reduces the number of the patterns by one on each iteration.

这篇关于SED:持有模式和重新排列线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆