如何删除文件中除第一个匹配行之外的重复行 [英] how to delete the duplicate lines in file except the first matched line
问题描述
在下面的配置文件中
/etc/fine-tune.conf
我们有重复的行
clean_history_in_os=true
我们要删除包含 clean_history_in_os=true 的所有行除了文件中第一个匹配的行
we want to delete all the lines that include clean_history_in_os=true except the first matched line in the file
到目前为止我所做的是
sed -i '/clean_history_in_os=true/d' /etc/fine-tune.conf
但问题是 sed 删除了所有clean_history_in_os=true"行
but the problem is that sed delete all "clean_history_in_os=true" lines
我很乐意得到解决这个问题的想法,
I will happy to get ideas to solve this issue ,
推荐答案
使用 Perl
perl -i -ne'next if /clean_history_in_os=true/ && ++$ok > 1; print' file
这会在该行上增加计数器并且如果 >1
跳过该行,否则打印
This increments the counter when on that line and if > 1
it skips the line, otherwise prints
问题是,如果我们将模式作为 shell 变量,如何将模式传递给 Perl.下面我假设 shell 变量 $VAR
包含字符串 clean_history...
The question came up of how to pass the pattern to Perl if we have it as a shell variable. Below I assume that the shell variable $VAR
contains the string clean_history...
在所有这些中,shell 变量的值直接用作正则表达式中的模式.如果它是问题中的文字字符串,则下面的代码如给定.但是,如果可能存在特殊字符,则应将其转义;因此,在正则表达式中使用时,您可能希望在模式之前使用 Q
.作为一般注意事项,应该注意不要使用来自 shell 的输入来运行代码(比如在 /e
下).
In all this a shell variable's value is directly used as a pattern in a regex. If it's the literal string from the question then the code below goes as given. However, if there may be special characters they should be escaped; so you may want to precede the pattern with Q
when used in regex. As a general note, one should take care to not use input from the shell to run code (say under /e
).
将其作为参数传递,然后在 @ARGV一个>
perl -i -ne'
BEGIN { $qr=shift; };
next if /$qr/ && ++$ok > 1; print
' "$VAR" file
BEGIN
块 在运行之前的 BEGIN
阶段运行(因此不适用于以下迭代).其中 shift 从 @ARGV
中删除第一个元素,在上面的调用中,它是 $VAR
中的值,首先由 shell 插入.然后文件名 file
保留在 @ARGV
中,以便在 -n
下处理(打开文件并迭代其行)
where the BEGIN
block runs in the BEGIN
phase, before runtime (so not for the following iterations). In it shift removes the first element from @ARGV
, which in the above invocation is the value in $VAR
, first interpolated by shell. Then the filename file
remains in @ARGV
, so available for processing under -n
(file is opened and its lines iterated over)
使用 -s
开关,为程序启用命令行开关
Use the -s
switch, which enables command-line switches for the program
perl -i -s -ne'next if /$qr/ && ++$ok > 1; print' -- -qr="$VAR" file
--
(在''
下的一行程序之后)标记程序参数的开始;然后 -qr
将一个变量 $qr
引入到程序中,并为它分配了一个如上的值(只有 -qr
变量 >$qr
得到值 1
,所以是一个标志).
The --
(after the one-line program under ''
) marks the start of arguments for the program; then -qr
introduces a variable $qr
into the program, with a value assigned to it as above (with just -qr
the variable $qr
gets value 1
, so is a flag).
任何此类选项都必须出现在可能的文件名之前,并将它们从 @ARGV
中删除,以便程序可以正常处理提交的文件.
Any such options must come before possible filenames, and they are removed from @ARGV
so the program can then normally process the submitted files.
导出 bash 变量,使其成为可以在 Perl 程序中通过 %ENV
hash
Export the bash variable, making it an environment variable which can then be accessed in the Perl program via %ENV
hash
export VAR="clean_history..."
perl -i -ne'next if /$ENV{VAR}/ && ++$ok > 1; print' file
或者,如果$VAR
仅用于这一行,则可以使用较短的(必须在一行上)
or, if $VAR
is used only in this one-liner, can use the shorter (what must be on one line)
VAR="clean_history..." perl -i -ne'...' file
我宁愿推荐前两个选项中的任何一个,而不是这个.
I would rather recommend either of the first two options, over this one.
这些是将输入传递给完全在命令行(单行)上输入的 Perl 程序的方法,无需 STDIN
或文件.使用脚本最好使用库,首先Getopt::Long.
These are ways to pass input to a Perl program entered entirely on the command-line (one-liner), without STDIN
or files. With a script better use a library, in the first place Getopt::Long.
对评论中给出的问题进行了改进,指定如果短语 clean_...
以 #
开头,则应完全跳过该行.单独测试最简单
A refinement of the question given in a comment specifies that if the phrase clean_...
starts with a #
then that line should be skipped altogether. It's simplest to separately test for that
next if /#$qr/; next if /$qr/ && ++$ok > 1; print
或者,依靠短路
next if /#$qr/ || (/$qr/ && ++$ok > 1); print
第一个版本不太容易出错,而且可能更清晰.
The first version is less error prone and probably clearer.
这篇关于如何删除文件中除第一个匹配行之外的重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!