SED:持有模式和重新排列线 [英] sed: hold pattern and rearrange line
问题描述
我不知道如果我能与SED纯粹做到这一点:
I am not sure if I can do this purely with sed:
我试图重新安排这样的行
I am trying to rearrange lines like this
GF:001,GF:00012,GF:01223<TAB>XXR
GF:001,GF:00012,GF:01223,GF:0666<TAB>XXXR3
到
GF:001<TAB>XXR
GF:00012<TAB>XXR
GF:01223<TAB>XXR
GF:001<TAB>XXXR3
GF:00012<TAB>XXXR3
GF:01223<TAB>XXXR3
GF:0666<TAB>XXXR3
任何人任何提示? GF的基数:XXXX是交替作为GF长度:XXXX是
Anyone any hints? The cardinality of GF:XXXX is alternating as the length of GF:XXXX is.
我坚持 SED -n
但我不能参照原匹配的模式摆在首位。有任何想法吗?干杯!
/\\(XX.*\\)$/'{
S /,/ \\ t \\ 1 \\ N /
}'输入
I am stuck with sed -n '
'/\(XX.*\)$/' {
s/,/\t\1\n/
}' input
but I cannot reference to the originally matched pattern in the first place. any ideas? cheers!
更新:
我认为这是不可能的,只是用sed来做到这一点。所以我用Perl来做到这一点:
Update: I think it is not possible to do this with just using sed. So I used perl to do this:
perl -e 'open(IN, "< file");
while (<IN>) {
@a = split(/\t/);
@gos = split(/,/, $a[0]);
foreach (@gos) {
print $_."\t".$a[1];
}
close( IN );' > output
但是,如果有人知道的方式来解决这个刚刚与 SED
请在这里发表吧...
But if anyone knows a way to solve this just with sed
please post it here...
推荐答案
可以在 SED
做,虽然我可能会使用Perl(或awk的或Python),以做到这一点。
It can be done in sed
, though I probably would use Perl (or Awk or Python) to do it.
我没要求的优雅此解决方案,但蛮力和无知有时不负有心人。我创建了一个名为文件,unoriginally, sed.script
包含:
I claim no elegance for this solution, but brute force and ignorance sometimes pays off. I created a file called, unoriginally, sed.script
containing:
/\(GF:[0-9]*\),\(.*\)<TAB>\(.*\)/{
:redo
s/\(GF:[0-9]*\),\(.*\)<TAB>\(.*\)/\1<TAB>\3@@@@@\2<TAB>\3/
h
s/@@@@@.*//
p
x
s/.*@@@@@//
t redo
d
}
我跑了它作为:
sed -f sed.script input
其中,输入
包含在问题中所示的两行。它产生的输出:
where input
contained the two lines shown in the question. It produced the output:
GF:001<TAB>XXR
GF:00012<TAB>XXR
GF:01223<TAB>XXR
GF:001<TAB>XXXR3
GF:00012<TAB>XXXR3
GF:01223<TAB>XXXR3
GF:0666<TAB>XXXR3
(我带的自由故意misinter preting &LT;标签&gt;
是一个5个字符的序列,而不是一个制表符,你可以轻松地修复答案来处理实际的标签字符,而不是。)
(I took the liberty of deliberately misinterpreting <TAB>
to be a 5-character sequence instead of a single tab character; you can easily fix the answer to handle an actual tab character instead.)
在 SED
脚本的说明:
- 找到与
GF中出现了多次线:NNN
用逗号隔开(我们不需要处理包含单出现这样的线)。仅做这样行脚本的其余部分。其它则穿过(印刷)不变。 - 创建一个标签,所以我们可以分支回到它
- 拆分行成3部分记忆。第一部分是初始GF信息;第二部分是任何其他GF信息;第三部分是
&LT后场;标签&gt;
。与第一场替换此,&LT;标签&gt;
,第三个字段,令人难以置信的标志图案(@@@@@
),第二个字段,&LT;标签&gt;
,第三领域 - 修改后的行复制到保留空间。
- 删除标记图案到底。 li>
- 打印。
- 交换保留空间到模式空间。
- 删除一切直至并包括该标记模式。
- 如果我们做任何工作,回到
重做
标签。 - 删除剩下的东西(它已经打印)。
- 脚本块的结束。
- Find lines with more than one occurrence of
GF:nnn
separated by commas (we do not need to process lines that contain a single such occurrence). Do the rest of the script only on such lines. Anything else is passed through (printed) unchanged. - Create a label so we can branch back to it
- Split the line into 3 remembered parts. The first part is the initial GF information; the second part is any other GF information; the third part is the field after the
<TAB>
. Replace this with the first field,<TAB>
, third field, implausible marker pattern (@@@@@
), second field,<TAB>
, third field. - Copy the modified line to the hold space.
- Delete the marker pattern to the end.
- Print.
- Swap the hold space into the pattern space.
- Remove everything up to and including the marker pattern.
- If we've done any work, go back to the
redo
label. - Delete what's left (it was printed already).
- End of script block.
这是一个简单的循环,在每次迭代被一个降低模式的数目
This is a simple loop that reduces the number of the patterns by one on each iteration.
这篇关于SED:持有模式和重新排列线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!