SED：持有模式和重新排列线 [英] sed: hold pattern and rearrange line

查看：107 发布时间：2016/7/28 16:47:55 regex variables sed awk

本文介绍了SED：持有模式和重新排列线的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我不知道如果我能与SED纯粹做到这一点：

I am not sure if I can do this purely with sed:

我试图重新安排这样的行

I am trying to rearrange lines like this

GF:001,GF:00012,GF:01223<TAB>XXR
GF:001,GF:00012,GF:01223,GF:0666<TAB>XXXR3

到

GF:001<TAB>XXR
GF:00012<TAB>XXR
GF:01223<TAB>XXR
GF:001<TAB>XXXR3
GF:00012<TAB>XXXR3
GF:01223<TAB>XXXR3
GF:0666<TAB>XXXR3

任何人任何提示？ GF的基数：XXXX是交替作为GF长度：XXXX是

Anyone any hints? The cardinality of GF:XXXX is alternating as the length of GF:XXXX is.

我坚持SED -n /\$XX.*\$$/'{ S /，/ \\ t \\ 1 \\ N / }'输入但我不能参照原匹配的模式摆在首位。有任何想法吗？干杯！

I am stuck with sed -n ' '/$XX.*$$/' { s/,/\t\1\n/ }' input but I cannot reference to the originally matched pattern in the first place. any ideas? cheers!

更新：
我认为这是不可能的，只是用sed来做到这一点。所以我用Perl来做到这一点：

Update: I think it is not possible to do this with just using sed. So I used perl to do this:

perl -e 'open(IN, "< file");
while (<IN>) {
    @a = split(/\t/);
    @gos = split(/,/, $a[0]);
    foreach (@gos) {
      print $_."\t".$a[1];
    }
close( IN );' > output

但是，如果有人知道的方式来解决这个刚刚与 SED 请在这里发表吧...

But if anyone knows a way to solve this just with sed please post it here...

推荐答案

可以在 SED 做，虽然我可能会使用Perl（或awk的或Python），以做到这一点。

It can be done in sed, though I probably would use Perl (or Awk or Python) to do it.

我没要求的优雅此解决方案，但蛮力和无知有时不负有心人。我创建了一个名为文件，unoriginally， sed.script 包含：

I claim no elegance for this solution, but brute force and ignorance sometimes pays off. I created a file called, unoriginally, sed.script containing:

/\(GF:[0-9]*\),\(.*\)<TAB>\(.*\)/{
:redo
s/\(GF:[0-9]*\),\(.*\)<TAB>\(.*\)/\1<TAB>\3@@@@@\2<TAB>\3/
h
s/@@@@@.*//
p
x
s/.*@@@@@//
t redo
d
}

我跑了它作为：

sed -f sed.script input

其中，输入包含在问题中所示的两行。它产生的输出：

where input contained the two lines shown in the question. It produced the output:

GF:001<TAB>XXR
GF:00012<TAB>XXR
GF:01223<TAB>XXR
GF:001<TAB>XXXR3
GF:00012<TAB>XXXR3
GF:01223<TAB>XXXR3
GF:0666<TAB>XXXR3

（我带的自由故意misinter preting ＆LT;标签＆gt; 是一个5个字符的序列，而不是一个制表符，你可以轻松地修复答案来处理实际的标签字符，而不是。）

(I took the liberty of deliberately misinterpreting <TAB> to be a 5-character sequence instead of a single tab character; you can easily fix the answer to handle an actual tab character instead.)

在 SED 脚本的说明：

找到与 GF中出现了多次线：NNN 用逗号隔开（我们不需要处理包含单出现这样的线）。仅做这样行脚本的其余部分。其它则穿过（印刷）不变。

创建一个标签，所以我们可以分支回到它

拆分行成3部分记忆。第一部分是初始GF信息;第二部分是任何其他GF信息;第三部分是＆LT后场;标签＆gt; 。与第一场替换此，＆LT;标签＆gt; ，第三个字段，令人难以置信的标志图案（ @@@@@ ），第二个字段，＆LT;标签＆gt; ，第三领域

修改后的行复制到保留空间。

删除标记图案到底。
打印。

交换保留空间到模式空间。

删除一切直至并包括该标记模式。

如果我们做任何工作，回到重做标签。

删除剩下的东西（它已经打印）。

脚本块的结束。

Find lines with more than one occurrence of GF:nnn separated by commas (we do not need to process lines that contain a single such occurrence). Do the rest of the script only on such lines. Anything else is passed through (printed) unchanged.
Create a label so we can branch back to it
Split the line into 3 remembered parts. The first part is the initial GF information; the second part is any other GF information; the third part is the field after the <TAB>. Replace this with the first field, <TAB>, third field, implausible marker pattern (@@@@@), second field, <TAB>, third field.
Copy the modified line to the hold space.
Delete the marker pattern to the end.
Print.
Swap the hold space into the pattern space.
Remove everything up to and including the marker pattern.
If we've done any work, go back to the redo label.
Delete what's left (it was printed already).
End of script block.

这是一个简单的循环，在每次迭代被一个降低模式的数目

This is a simple loop that reduces the number of the patterns by one on each iteration.

这篇关于SED：持有模式和重新排列线的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

SED：持有模式和重新排列线 [英] sed: hold pattern and rearrange line

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录关闭

SED：持有模式和重新排列线 [英] sed: hold pattern and rearrange line

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录 关闭

登录关闭