从文本文件中删除重复行? [英] Remove Duplicate Lines From Text File?

查看:26
本文介绍了从文本文件中删除重复行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个文本行的输入文件,我希望识别和删除重复的行.请展示一个简单的 C# 片段来完成此操作.

Given an input file of text lines, I want duplicate lines to be identified and removed. Please show a simple snippet of C# that accomplishes this.

推荐答案

应该这样做(并且会复制大文件).

This should do (and will copy with large files).

注意它只删除重复的连续行,即

Note that it only removes duplicate consecutive lines, i.e.

a
b
b
c
b
d

最终会变成

a
b
c
b
d

如果您不想在任何地方重复,则需要保留一组您已经看过的行.

If you want no duplicates anywhere, you'll need to keep a set of lines you've already seen.

using System;
using System.IO;

class DeDuper
{
    static void Main(string[] args)
    {
        if (args.Length != 2)
        {
            Console.WriteLine("Usage: DeDuper <input file> <output file>");
            return;
        }
        using (TextReader reader = File.OpenText(args[0]))
        using (TextWriter writer = File.CreateText(args[1]))
        {
            string currentLine;
            string lastLine = null;

            while ((currentLine = reader.ReadLine()) != null)
            {
                if (currentLine != lastLine)
                {
                    writer.WriteLine(currentLine);
                    lastLine = currentLine;
                }
            }
        }
    }
}

请注意,这假定为 Encoding.UTF8,并且您要使用文件.不过,它很容易概括为一种方法:

Note that this assumes Encoding.UTF8, and that you want to use files. It's easy to generalize as a method though:

static void CopyLinesRemovingConsecutiveDupes
    (TextReader reader, TextWriter writer)
{
    string currentLine;
    string lastLine = null;

    while ((currentLine = reader.ReadLine()) != null)
    {
        if (currentLine != lastLine)
        {
            writer.WriteLine(currentLine);
            lastLine = currentLine;
        }
    }
}

(请注意,这不会关闭任何东西 - 调用者应该这样做.)

(Note that that doesn't close anything - the caller should do that.)

以下版本将删除所有个重复项,而不仅仅是连续的:

Here's a version that will remove all duplicates, rather than just consecutive ones:

static void CopyLinesRemovingAllDupes(TextReader reader, TextWriter writer)
{
    string currentLine;
    HashSet<string> previousLines = new HashSet<string>();

    while ((currentLine = reader.ReadLine()) != null)
    {
        // Add returns true if it was actually added,
        // false if it was already there
        if (previousLines.Add(currentLine))
        {
            writer.WriteLine(currentLine);
        }
    }
}

这篇关于从文本文件中删除重复行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆