匹配部分重复的行 [英] Match Partially Duplicated Lines
问题描述
列表中的行有时与第一个空格字符相似,然后可以更改(即之后的日期)。
I have rows in a list that are sometimes similar up to the first "space" character, then can change (i.e. a date afterwards).
wsmith jul/12/12
bwillis jul/13/13
wsmith jul/14/12
tcruise jul/12/12
我可以轻松地对行进行排序,但是我希望删除重复的过时条目。我确实找到了一个正则表达式建议,但它仅匹配完全相同的行。我需要能够在文件中标记相似用户名的整行。在上面的示例中,第1行和第3行将突出显示。
I can easily sort the lines, but I'd love to remove the duplicate later dated entry. I did find a regex suggestion, but it matches only exactly the same lines. I need to be able to mark the entire row of similar usernames in the file. In my example above, lines 1 and 3 would be highlighted.
(为清楚起见进行编辑)
(edited for clarity)
推荐答案
PCRE
引擎(由Notepad ++使用)中的一个紧凑公式,用于查看是否存在从一行到另一行的重复
A compact formula in the PCRE
engine (used by Notepad++) to see if there is repetition from one row to another would be
(?m)^(\S+).*\R(?s).*?\K\1
这将在N ++中工作。
This will work in N++.
当您删除重复的行时,更多内容可能会被标记,因为最初的正则表达式会跳过中间的行以突出显示重复项。
As you remove duplicate lines, more may become marked, because initially the regex skips over the in-between lines in order to highlight the duplicate.
说明
-
(?m)
启用多行模式,允许^
和$
在每行上匹配 -
^
锚断言我们在字符串的开头 -
(\ S +)
将非空格字符捕获到组1 -
。*
到行尾 -
\R
换行符 -
(?s)
激活DOTALL
模式,允许点跨线匹配 -
。* ?
懒惰地匹配字符,直到... -
\K
告诉引擎放弃什么与最终比赛相距甚远,它会返回 -
\1
后向引用:匹配第1组之前捕获的内容。
(?m)
turns on multi-line mode, allowing^
and$
to match on each line- The
^
anchor asserts that we are at the beginning of the string (\S+)
captures non-space chars to Group 1.*
gets to the end of the line\R
line break(?s)
activatesDOTALL
mode, allowing the dot to match across lines.*?
lazily match chars up to ...- The
\K
tells the engine to drop what was matched so far from the final match it returns \1
back-reference: match what Group 1 captured before.
这篇关于匹配部分重复的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!