批量删除文本文件中的重复行 [英] Batch to remove duplicate rows from text file

查看:40
本文介绍了批量删除文本文件中的重复行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以从文本文件中删除重复的行?如果是,如何?

Is it possible to remove duplicate rows from a text file? If yes, how?

推荐答案

当然可以,但是和大多数文本文件批处理一样,不美观,也不是特别快.

Sure can, but like most text file processing with batch, it is not pretty, and it is not particularly fast.

此解决方案在查找重复项时会忽略大小写,并对行进行排序.文件名作为批处理脚本的第一个也是唯一的参数传入.

This solution ignores case when looking for duplicates, and it sorts the lines. The name of the file is passed in as the 1st and only argument to the batch script.

@echo off
setlocal disableDelayedExpansion
set "file=%~1"
set "sorted=%file%.sorted"
set "deduped=%file%.deduped"
::Define a variable containing a linefeed character
set LF=^


::The 2 blank lines above are critical, do not remove
sort "%file%" >"%sorted%"
>"%deduped%" (
  set "prev="
  for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%sorted%") do (
    set "ln=%%A"
    setlocal enableDelayedExpansion
    if /i "!ln!" neq "!prev!" (
      endlocal
      (echo %%A)
      set "prev=%%A"
    ) else endlocal
  )
)
>nul move /y "%deduped%" "%file%"
del "%sorted%"

此解决方案区分大小写,并按原始顺序保留行(当然重复项除外).文件名再次作为第一个也是唯一的参数传入.

This solution is case sensitive and it leaves the lines in the original order (except for duplicates of course). Again the name of the file is passed in as the 1st and only argument.

@echo off
setlocal disableDelayedExpansion
set "file=%~1"
set "line=%file%.line"
set "deduped=%file%.deduped"
::Define a variable containing a linefeed character
set LF=^


::The 2 blank lines above are critical, do not remove
>"%deduped%" (
  for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%file%") do (
    set "ln=%%A"
    setlocal enableDelayedExpansion
    >"%line%" (echo !ln:=\!)
    >nul findstr /xlg:"%line%" "%deduped%" || (echo !ln!)
    endlocal
  )
)
>nul move /y "%deduped%" "%file%"
2>nul del "%line%"


编辑

上面的两种解决方案都去除了空白行.在谈论不同的值时,我认为空行不值得保留.

Both solutions above strip blank lines. I didn't think blank lines were worth preserving when talking about distinct values.

我修改了两个解决方案以禁用 FOR/F "EOL" 选项,以便保留所有非空白行,无论第一个字符是什么.修改后的代码将 EOL 选项设置为换行符.

I've modified both solutions to disable the FOR /F "EOL" option so that all non-blank lines are preserved, regardless what the 1st character is. The modified code sets the EOL option to a linefeed character.


新解决方案 2016-04-13:JSORT.BAT

您可以使用我的 JSORT.BAT 混合 JScript/批处理实用程序 使用简单的一行行有效地排序和删除重复行(加上一个 MOVE 以用最终结果覆盖原始文件).JSORT 是纯脚本,可​​以在 XP 以后的任何 Windows 机器上本地运行.

You can use my JSORT.BAT hybrid JScript/batch utility to efficiently sort and remove duplicate lines with a simple one liner (plus a MOVE to overwrite the original file with the final result). JSORT is pure script that runs natively on any Windows machine from XP onward.

@jsort file.txt /u >file.txt.new
@move /y file.txt.new file.txt >nul

这篇关于批量删除文本文件中的重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆