如何合并文本文件中以相同项目开头的行 [英] How to merge lines that start with the same items in a text file
问题描述
我有一个包含几千行的文本文件,如下所示:
文件:
abc:bla1 bla1 bla1 ...
cde:bla bla bla ...
ghk:bla1 bla1 bla1 ...
lmn :bla bla bla ...
abc:bla2 bla2 bla2 ...
bcd:bla bla bla ...
ghk:bla2 bla2 bla2 ...
xyz:bla bla bla ...
我想合并所有以相同项目开头的行(如 1和5,3和7
),这样我就得到了一个新的文本文件:
新File:
abc:bla1 bla1 bla1 ... * abc:bla2 bla2 bla2 ...
cde:bla bla bla ...
ghk:bla1 bla1 bla1 ... * ghk:bla2 bla2 bla2 ...
lmn:bla bla bla ...
bcd:bla bla bla ...
xyz:bla bla bla ...
我想知道这是不是可以用 regex
和/或 grep
来解决,如果是,那我该如何解决它?
我很熟悉w ith grep
,因为我在TextWrangler上,但也可以使用其他文本编辑器。
非常感谢。
如果顺序无关紧要,我建议先对文本进行排序。这将放置
abc:...
abc:...
相邻。然后,您将通过几次传递运行:
搜索:
^(\w +):(。*)\\\
\1:
替换:
\\ \\ 1:\ 2
结果:
abc:bla1 bla1 bla1 ... bla2 bla2 bla2 ...
bcd:bla bla bla ...
cde:bla bla bla ...
ghk:bla1 bla1 bla1 ... bla2 bla2 bla2 ...
lmn:bla bla bla ...
xyz:bla bla bla ...
如果订单确实重要,那么这个正则表达式可以运行几次:
搜索: (。*)\\\
((?:(?!\1)。* \\\
)+)\1:(。* \\\
)
替换:
\ 1:\ 2 \4\3
结果(第一遍):
abc:bla1 bla1 bla1 ... bla2 bla2 bla2 ...
cde:bla bla bla ...
ghk:bla1 bla1 bla1 ...
lmn:bla bla bla ...
bcd:bla bla bla .. 。
ghk:bla2 bla2 bla 2 ...
xyz:bla bla bla ...
结果(2nd pass):
abc:bla1 bla1 bla1 ... bla2 bla2 bla2 ...
cde:bla bla bla ...
ghk:bla1 bla1 bla1 ... bla2 bla2 bla2 ...
lmn:bla bla bla ...
bcd:bla bla bla。 ..
xyz:bla bla bla ...
I have a text file containing some thousand lines as follows:
File:
abc: bla1 bla1 bla1...
cde: bla bla bla...
ghk: bla1 bla1 bla1...
lmn: bla bla bla...
abc: bla2 bla2 bla2...
bcd: bla bla bla...
ghk: bla2 bla2 bla2...
xyz: bla bla bla...
I want to merge all the lines that start with the same items (as 1 and 5, 3 and 7
) so that I have a new text file like this:
New File:
abc: bla1 bla1 bla1... * abc: bla2 bla2 bla2...
cde: bla bla bla...
ghk: bla1 bla1 bla1... * ghk: bla2 bla2 bla2...
lmn: bla bla bla...
bcd: bla bla bla...
xyz: bla bla bla...
I wonder if this is possible to be solved using regex
and/or grep
, and if yes then how can I solve it?
I'm quite familiar with grep
because I'm on TextWrangler, but also OK with other text editors.
Help much appreciated.
If order doesn't matter, I suggest first sorting the text. That will place
abc: ...
abc: ...
next to one another. Then you'll run this regex through a few passes:
Search:
^(\w+): (.*)\n\1:
Replace:
\1: \2
Result:
abc: bla1 bla1 bla1... bla2 bla2 bla2...
bcd: bla bla bla...
cde: bla bla bla...
ghk: bla1 bla1 bla1... bla2 bla2 bla2...
lmn: bla bla bla...
xyz: bla bla bla...
If order DOES matter, then this regex can be run through a few times:
Search:
^(\w+): (.*)\n((?:(?!\1).*\n)+)\1: (.*\n)
Replace:
\1: \2 \4\3
Result (1st pass):
abc: bla1 bla1 bla1... bla2 bla2 bla2...
cde: bla bla bla...
ghk: bla1 bla1 bla1...
lmn: bla bla bla...
bcd: bla bla bla...
ghk: bla2 bla2 bla2...
xyz: bla bla bla...
Result (2nd pass):
abc: bla1 bla1 bla1... bla2 bla2 bla2...
cde: bla bla bla...
ghk: bla1 bla1 bla1... bla2 bla2 bla2...
lmn: bla bla bla...
bcd: bla bla bla...
xyz: bla bla bla...
这篇关于如何合并文本文件中以相同项目开头的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!