如何使用 RegEx 删除文本中不连续的行? [英] How to delete nonconsecutive lines in text using RegEx?

查看:8
本文介绍了如何使用 RegEx 删除文本中不连续的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Notepad++ 中使用以下表达式删除重复行:

I use the following expression in Notepad++ to delete duplicate lines:

^(.*)(
?
1)+$ 

问题是:

  1. 仅适用于单字行,如果一行中有空格则不起作用.
  2. 仅适用于连续的重复行.

是否有解决方案(最好是正则表达式或宏)来删除包含空格且不连续的文本中的重复行?

Is there a solution (preferably regular expression or macro) to delete duplicate lines in a text that contains space, and that are nonconsecutive?

推荐答案

由于没有人感兴趣,我将发布我认为您需要的内容.

Since no one is interested, I will post what I think you need.

删除包含空格且不连续的文本中的重复行

delete duplicate lines in a text that contains space, and that are nonconsecutive

我假设您的文本具有重复的行 我的第一行和一些文本我的第二行和更多文本:

I assume you have text having, say duplicate lines My Line One and some text and My Line Two and more text:

My Line One and some text
My Line One and some text
My Line Two and more text
My Line One and some text
My Line Two and more text

这些重复的行并不都是连续的(只有前两行).

These duplicate lines are not all consecutive (only the first two).

因此,您可以通过运行此搜索和替换来删除重复的行:

So, you can remove duplicate lines by running this search and replace:

^(.+)
?
(?=[sS]*?^1$)

用空字符串替换.

正则表达式注意:^$ 默认被视为行开始/结束锚点,因此我们只匹配一行并使用 ^(.+)$.然后我们将换行符(任何操作系统样式)与 ? 匹配.前瞻 (?=...) 检查在我们的检查行之后是否有任何文本(带有 [sS]*?)内容(使用 ^1$ 其中 1 是对捕获的行文本的反向引用).

Regex note: ^ and $ are treated as line start/end anchors by default, so we only match one line and capture it with ^(.+)$. Then we match the newline symbol (any OS style) with ? . The look-ahead (?=...) checks if there is any text (with [sS]*?) after our line under inspection with the same contents (with the ^1$ where 1 is a backreference to the line text captured).

这篇关于如何使用 RegEx 删除文本中不连续的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆