如何删除文件中除第一个匹配行之外的重复行 [英] how to delete the duplicate lines in file except the first matched line

查看:20
本文介绍了如何删除文件中除第一个匹配行之外的重复行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下面的配置文件中

/etc/fine-tune.conf

我们有重复的行

clean_history_in_os=true

我们要删除包含 clean_history_in_os=true 的所有行除了文件中第一个匹配的行

we want to delete all the lines that include clean_history_in_os=true except the first matched line in the file

到目前为止我所做的是

  sed  -i '/clean_history_in_os=true/d' /etc/fine-tune.conf

但问题是 sed 删除了所有clean_history_in_os=true"行

but the problem is that sed delete all "clean_history_in_os=true" lines

我很乐意得到解决这个问题的想法,

I will happy to get ideas to solve this issue ,

推荐答案

使用 Perl

perl -i -ne'next if /clean_history_in_os=true/ && ++$ok > 1; print' file

这会在该行上增加计数器并且如果 >1 跳过该行,否则打印

This increments the counter when on that line and if > 1 it skips the line, otherwise prints

问题是,如果我们将模式作为 shell 变量,如何将模式传递给 Perl.下面我假设 shell 变量 $VAR 包含字符串 clean_history...

The question came up of how to pass the pattern to Perl if we have it as a shell variable. Below I assume that the shell variable $VAR contains the string clean_history...

在所有这些中,shell 变量的值直接用作正则表达式中的模式.如果它是问题中的文字字符串,则下面的代码如给定.但是,如果可能存在特殊字符,则应将其转义;因此,在正则表达式中使用时,您可能希望在模式之前使用 Q.作为一般注意事项,应该注意不要使用来自 shell 的输入来运行代码(比如在 /e 下).

In all this a shell variable's value is directly used as a pattern in a regex. If it's the literal string from the question then the code below goes as given. However, if there may be special characters they should be escaped; so you may want to precede the pattern with Q when used in regex. As a general note, one should take care to not use input from the shell to run code (say under /e).

  • 将其作为参数传递,然后在 @ARGV

  perl -i -ne'
      BEGIN { $qr=shift; }; 
      next if /$qr/ && ++$ok > 1; print
  ' "$VAR" file

BEGIN 在运行之前的 BEGIN 阶段运行(因此不适用于以下迭代).其中 shift@ARGV 中删除第一个元素,在上面的调用中,它是 $VAR 中的值,首先由 shell 插入.然后文件名 file 保留在 @ARGV 中,以便在 -n 下处理(打开文件并迭代其行)

where the BEGIN block runs in the BEGIN phase, before runtime (so not for the following iterations). In it shift removes the first element from @ARGV, which in the above invocation is the value in $VAR, first interpolated by shell. Then the filename file remains in @ARGV, so available for processing under -n (file is opened and its lines iterated over)

使用 -s 开关,为程序启用命令行开关

Use the -s switch, which enables command-line switches for the program

  perl -i -s -ne'next if /$qr/ && ++$ok > 1; print' -- -qr="$VAR" file

--(在''下的一行程序之后)标记程序参数的开始;然后 -qr 将一个变量 $qr 引入到程序中,并为它分配了一个如上的值(只有 -qr 变量 >$qr 得到值 1,所以是一个标志).

The -- (after the one-line program under '') marks the start of arguments for the program; then -qr introduces a variable $qr into the program, with a value assigned to it as above (with just -qr the variable $qr gets value 1, so is a flag).

任何此类选项都必须出现在可能的文件名之前,并将它们从 @ARGV 中删除,以便程序可以正常处理提交的文件.

Any such options must come before possible filenames, and they are removed from @ARGV so the program can then normally process the submitted files.

导出 bash 变量,使其成为可以在 Perl 程序中通过 %ENV hash

Export the bash variable, making it an environment variable which can then be accessed in the Perl program via %ENV hash

  export VAR="clean_history..."
  perl -i -ne'next if /$ENV{VAR}/ && ++$ok > 1; print' file

或者,如果$VAR仅用于这一行,则可以使用较短的(必须在一行上)

or, if $VAR is used only in this one-liner, can use the shorter (what must be on one line)

    VAR="clean_history..."  perl -i -ne'...' file
    

我宁愿推荐前两个选项中的任何一个,而不是这个.

I would rather recommend either of the first two options, over this one.

这些是将输入传递给完全在命令行(单行)上输入的 Perl 程序的方法,无需 STDIN 或文件.使用脚本最好使用库,首先Getopt::Long.

These are ways to pass input to a Perl program entered entirely on the command-line (one-liner), without STDIN or files. With a script better use a library, in the first place Getopt::Long.

对评论中给出的问题进行了改进,指定如果短语 clean_...# 开头,则应完全跳过该行.单独测试最简单

A refinement of the question given in a comment specifies that if the phrase clean_... starts with a # then that line should be skipped altogether. It's simplest to separately test for that

next if /#$qr/; next if /$qr/ && ++$ok > 1; print

或者,依靠短路

next if /#$qr/ || (/$qr/ && ++$ok > 1); print

第一个版本不太容易出错,而且可能更清晰.

The first version is less error prone and probably clearer.

这篇关于如何删除文件中除第一个匹配行之外的重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆