删除后面带有随机文本的重复行 [英] Removing Duplicate lines with random text behind it

查看:43
本文介绍了删除后面带有随机文本的重复行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在记事本++中有这样的文字

I have text like this in Notepad++

Random Text Here:188.0.0.0
Random Text Here:188.0.3.0
Random Text Here:188.2.0.0

然而,最后的一些数字是重复的,我想去掉它们.例如:

However, some of the numbers at the end are duplicated and I am wanting to get rid of them. For example:

Random Text Here:188.0.3.0
Random Different Text Here:188.0.3.0

我将如何在群众中做到这一点,因为有数千条这样的线路?

How would I go about doing that in the mass's as there are thousands of these lines?

推荐答案

在 Notepad++ 中,我会尝试以下多步骤过程.

In Notepad++ I would try the following multi-step process.

(1) 使用正则表达式将所有行的IP地址和固定文本从Random Text Here:188.0.0.0改为:188.0.0.0!!!Random Text Here.

(1) Use a regular expression to change all lines to put the IP address and fixed text at the front from Random Text Here:188.0.0.0 to :188.0.0.0!!!Random Text Here.

(2) 使用 TextFx 对文件进行排序,去除重复项.

(2) Use TextFx to sort the file removing duplicates.

(3) 使用正则表达式查找和删除重复项.这可能需要多次通过.

(3) Use a regular expression to find and remove duplicate. This may need multiple passes.

(4) 使用正则表达式将文本放回正确的顺序.

(4) Use a regular expression to put the text back in the right order.

(5)(可选)再次对文件进行排序.

(5) (Optional) sort the file again.

上述方法的问题:

(a) 为 IP 地址排序的第一个随机文本"将被保留,而不是原始文件中的第一个.

(a) The "random text" that sorts first for an IP address will be the one that is kept, not the first in the original file.

(b) 根据是否使用步骤 (5),结果将按 IP 地址或随机文本排序.

(b) The result will be ordered by IP address or by the random text depending on whether step (5) is used.

更详细:

(0) 选择输入文件中没有出现的字符或短字符串.我将使用!!.

(0) Choose a character or a short string that does not occur in the input file. I will use !!.

(1) 对文件进行正则表达式替换(点匹配选择的换行符)以改变^(.*)(:\d+\.\d+\.\d+\.\d+)$$2!!$1.

(1) Do a regular expression replace on the file (with dot does not match newline selected) to change ^(.*)(:\d+\.\d+\.\d+\.\d+)$ to $2!!$1.

(2) 使用TextFx 对文件进行排序.指定 sort unique 可能有助于减少行数.

(2) Use TextFx to sort the file. Specifying sort unique may be useful to reduce the number of lines.

(3) 对文件进行正则表达式替换(点匹配选择的换行符)以改变^(:\d+\.\d+\.\d+\.\d+)!!(.*)\r\n\1.*$$1!!$2.当有多条线路具有相同的 IP 地址时,这将删除其中的大约一半.多次运行相同的替换,直到它报告没有进行任何更改.您可能需要根据文件中的行尾来更改 \r\n 部分

(3) Do a regular expression replace on the file (with dot does not match newline selected) to change ^(:\d+\.\d+\.\d+\.\d+)!!(.*)\r\n\1.*$ to $1!!$2. When there are several lines with the same IP address this will remove about half of them. Run the same replacement several times until it reports no changes have been made. You may need to alter the \r\n part depending on the line endings in your file

(4) 对文件进行正则表达式替换(点匹配选择的换行符)以改变^(:\d+\.\d+\.\d+\.\d+)!!(.*)$$2$1.

(4) Do a regular expression replace on the file (with dot does not match newline selected) to change ^(:\d+\.\d+\.\d+\.\d+)!!(.*)$ to $2$1.

(5)(可选)再次对文件进行排序.

(5) (Optional) sort the file again.

这篇关于删除后面带有随机文本的重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆