如何删除文件中除了第一条匹配行之外的重复行 [英] how to delete the duplicate lines in file except the first matched line
问题描述
在以下配置文件中
/etc/fine-tune.conf
我们重复的行为
clean_history_in_os=true
我们要删除所有包含 clean_history_in_os = true 的行 除了文件中的第一个匹配行
we want to delete all the lines that include clean_history_in_os=true except the first matched line in the file
我到目前为止所做的是
sed -i '/clean_history_in_os=true/d' /etc/fine-tune.conf
但是问题是sed删除了所有"clean_history_in_os = true"行
but the problem is that sed delete all "clean_history_in_os=true" lines
我很高兴获得解决这个问题的想法,
I will happy to get ideas to solve this issue ,
推荐答案
使用Perl
perl -i -ne'next if /clean_history_in_os=true/ && ++$ok > 1; print' file
这将使在该行上的计数器递增,如果> 1
则跳过该行,否则打印
This increments the counter when on that line and if > 1
it skips the line, otherwise prints
问题来了,如果我们将模式作为外壳变量,该如何将模式传递给Perl.下面我假设外壳变量$VAR
包含字符串clean_history...
The question came up of how to pass the pattern to Perl if we have it as a shell variable. Below I assume that the shell variable $VAR
contains the string clean_history...
在所有这些中,shell变量直接用作正则表达式中的模式.如果它是问题中的文字字符串,那么下面的代码将按照给定的方式运行.但是,如果可能有特殊字符,则应将其转义;因此,在正则表达式中使用时,您可能需要在模式前加上\Q
.作为一般说明,应该注意不要使用外壳程序的输入来运行代码(例如在/e
下).
In all of this a shell variable is directly used as a pattern in a regex. If it's the literal string from the question then the code below goes as given. However, if there may be special characters they should be escaped; so you may want to precede the pattern with \Q
when used in regex. As a general note, one should take care to not use input from the shell to run code (say under /e
).
-
将其作为参数传递,然后可在 @ARGV中使用
perl -i -ne'
BEGIN { $qr=shift; };
next if /$qr/ && ++$ok > 1; print
' "$VAR" file
其中 BEGIN
块在运行时之前在BEGIN
阶段运行(因此以下迭代不适用).在其中 shift 从@ARGV
中删除第一个元素,在上面的调用中是$VAR
中的值,首先由shell插值.然后,文件名file
保留在@ARGV
中,因此可以在-n
下进行处理(打开文件并对其行进行迭代)
where the BEGIN
block runs in the BEGIN
phase, before runtime (so not for the following iterations). In it shift removes the first element from @ARGV
, which in the above invocation is the value in $VAR
, first interpolated by shell. Then the filename file
remains in @ARGV
, so available for processing under -n
(file is opened and its lines iterated over)
使用 -s
开关,它将启用命令程序的在线开关
Use the -s
switch, which enables command-line switches for the program
perl -i -s -ne'next if /$qr/ && ++$ok > 1; print' -- -qr="$VAR" file
--
(在''
下的单行程序之后)标记该程序的参数开始;然后-qr
将变量$qr
引入程序,并为其分配一个如上所述的值(仅-qr
变量$qr
便获得值1
,因此是一个标志).
The --
(after the one-line program under ''
) marks the start of arguments for the program; then -qr
introduces a variable $qr
into the program, with a value assigned to it as above (with just -qr
the variable $qr
gets value 1
, so is a flag).
任何此类选项必须在可能的文件名之前,并将它们从@ARGV
中删除,以便程序可以正常处理提交的文件.
Any such options must come before possible filenames, and they are removed from @ARGV
so the program can then normally process the submitted files.
导出bash变量,使其成为环境变量,然后可以通过 %ENV
哈希
Export the bash variable, making it an environment variable which can then be accessed in the Perl program via %ENV
hash
export $VAR="clean_history..."
perl -i -ne'next if /$ENV{VAR}/ && ++$ok > 1; print' file
但是我宁愿推荐前两个选项中的任何一个.
But I would rather recommend either of the first two options, over this one.
对注释中给出的问题的改进表明,如果短语clean_...
以#
开头,则应完全跳过该行.单独测试最简单
A refinement of the question given in a comment specifies that if the phrase clean_...
starts with a #
then that line should be skipped altogether. It's simplest to separately test for that
next if /#$qr/; next if /$qr/ && ++$ok > 1; print
或者,依靠短路
next if /#$qr/ || (/$qr/ && ++$ok > 1); print
第一个版本不太容易出错,而且可能更清晰.
The first version is less error prone and probably clearer.
这篇关于如何删除文件中除了第一条匹配行之外的重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!