匹配部分重复的行 [英] Match Partially Duplicated Lines

查看:80
本文介绍了匹配部分重复的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

列表中的行有时与第一个空格字符相似,然后可以更改(即之后的日期)。

I have rows in a list that are sometimes similar up to the first "space" character, then can change (i.e. a date afterwards).

wsmith jul/12/12
bwillis jul/13/13
wsmith jul/14/12
tcruise jul/12/12

我可以轻松地对行进行排序,但是我希望删除重复的过时条目。我确实找到了一个正则表达式建议,但它仅匹配完全相同的行。我需要能够在文件中标记相似用户名的整行。在上面的示例中,第1行和第3行将突出显示。

I can easily sort the lines, but I'd love to remove the duplicate later dated entry. I did find a regex suggestion, but it matches only exactly the same lines. I need to be able to mark the entire row of similar usernames in the file. In my example above, lines 1 and 3 would be highlighted.

(为清楚起见进行编辑)

(edited for clarity)

推荐答案

PCRE 引擎(由Notepad ++使用)中的一个紧凑公式,用于查看是否存在从一行到另一行的重复

A compact formula in the PCRE engine (used by Notepad++) to see if there is repetition from one row to another would be

(?m)^(\S+).*\R(?s).*?\K\1

这将在N ++中工作。

This will work in N++.

当您删除重复的行时,更多内容可能会被标记,因为最初的正则表达式会跳过中间的行以突出显示重复项。

As you remove duplicate lines, more may become marked, because initially the regex skips over the in-between lines in order to highlight the duplicate.

说明


  • (?m)启用多行模式,允许 ^ $ 在每行上匹配

  • ^ 锚断言我们在字符串的开头

  • (\ S +)将非空格字符捕获到组1

  • 。* 到行尾

  • \R 换行符

  • (?s) 激活 DOTALL 模式,允许点跨线匹配

  • 。* ?懒惰地匹配字符,直到...

  • \K 告诉引擎放弃什么与最终比赛相距甚远,它会返回

  • \1 后向引用:匹配第1组之前捕获的内容。

  • (?m) turns on multi-line mode, allowing ^ and $ to match on each line
  • The ^ anchor asserts that we are at the beginning of the string
  • (\S+) captures non-space chars to Group 1
  • .* gets to the end of the line
  • \R line break
  • (?s) activates DOTALL mode, allowing the dot to match across lines
  • .*? lazily match chars up to ...
  • The \K tells the engine to drop what was matched so far from the final match it returns
  • \1 back-reference: match what Group 1 captured before.

这篇关于匹配部分重复的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆