如何删除文件中除了第一条匹配行之外的重复行 [英] how to delete the duplicate lines in file except the first matched line

查看:124
本文介绍了如何删除文件中除了第一条匹配行之外的重复行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在以下配置文件中

/etc/fine-tune.conf

我们重复的行为

clean_history_in_os=true

我们要删除所有包含 clean_history_in_os = true 的行 除了文件中的第一个匹配行

we want to delete all the lines that include clean_history_in_os=true except the first matched line in the file

我到目前为止所做的是

  sed  -i '/clean_history_in_os=true/d' /etc/fine-tune.conf

但是问题是sed删除了所有"clean_history_in_os = true"行

but the problem is that sed delete all "clean_history_in_os=true" lines

我很高兴获得解决这个问题的想法,

I will happy to get ideas to solve this issue ,

推荐答案

使用Perl

perl -i -ne'next if /clean_history_in_os=true/ && ++$ok > 1; print' file

这将使在该行上的计数器递增,如果> 1则跳过该行,否则打印

This increments the counter when on that line and if > 1 it skips the line, otherwise prints

问题来了,如果我们将模式作为外壳变量,该如何将模式传递给Perl.下面我假设外壳变量$VAR包含字符串clean_history...

The question came up of how to pass the pattern to Perl if we have it as a shell variable. Below I assume that the shell variable $VAR contains the string clean_history...

在所有这些中,shell变量直接用作正则表达式中的模式.如果它是问题中的文字字符串,那么下面的代码将按照给定的方式运行.但是,如果可能有特殊字符,则应将其转义;因此,在正则表达式中使用时,您可能需要在模式前加上\Q.作为一般说明,应该注意不要使用外壳程序的输入来运行代码(例如在/e下).

In all of this a shell variable is directly used as a pattern in a regex. If it's the literal string from the question then the code below goes as given. However, if there may be special characters they should be escaped; so you may want to precede the pattern with \Q when used in regex. As a general note, one should take care to not use input from the shell to run code (say under /e).

perl -i -ne'
    BEGIN { $qr=shift; }; 
    next if /$qr/ && ++$ok > 1; print
' "$VAR" file

其中 BEGIN在运行时之前在BEGIN阶段运行(因此以下迭代不适用).在其中 shift @ARGV中删除第一个元素,在上面的调用中是$VAR中的值,首先由shell插值.然后,文件名file保留在@ARGV中,因此可以在-n下进行处理(打开文件并对其行进行迭代)

where the BEGIN block runs in the BEGIN phase, before runtime (so not for the following iterations). In it shift removes the first element from @ARGV, which in the above invocation is the value in $VAR, first interpolated by shell. Then the filename file remains in @ARGV, so available for processing under -n (file is opened and its lines iterated over)

使用 -s开关,它将启用命令程序的在线开关

Use the -s switch, which enables command-line switches for the program

perl -i -s -ne'next if /$qr/ && ++$ok > 1; print' -- -qr="$VAR" file

--(在''下的单行程序之后)标记该程序的参数开始;然后-qr将变量$qr引入程序,并为其分配一个如上所述的值(仅-qr变量$qr便获得值1,因此是一个标志).

The -- (after the one-line program under '') marks the start of arguments for the program; then -qr introduces a variable $qr into the program, with a value assigned to it as above (with just -qr the variable $qr gets value 1, so is a flag).

任何此类选项必须在可能的文件名之前,并将它们从@ARGV中删除,以便程序可以正常处理提交的文件.

Any such options must come before possible filenames, and they are removed from @ARGV so the program can then normally process the submitted files.

导出bash变量,使其成为环境变量,然后可以通过 %ENV哈希

Export the bash variable, making it an environment variable which can then be accessed in the Perl program via %ENV hash

export $VAR="clean_history..."
perl -i -ne'next if /$ENV{VAR}/ && ++$ok > 1; print' file

但是我宁愿推荐前两个选项中的任何一个.

But I would rather recommend either of the first two options, over this one.

对注释中给出的问题的改进表明,如果短语clean_...#开头,则应完全跳过该行.单独测试最简单

A refinement of the question given in a comment specifies that if the phrase clean_... starts with a # then that line should be skipped altogether. It's simplest to separately test for that

next if /#$qr/; next if /$qr/ && ++$ok > 1; print

或者,依靠短路

next if /#$qr/ || (/$qr/ && ++$ok > 1); print

第一个版本不太容易出错,而且可能更清晰.

The first version is less error prone and probably clearer.

这篇关于如何删除文件中除了第一条匹配行之外的重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆