删除重复的行从文本文件? [英] Remove Duplicate Lines From Text File?
问题描述
由于文本行的输入文件,我想重复的行被识别并删除。请出示的C#,完成此一个简单的代码片段。
Given an input file of text lines, I want duplicate lines to be identified and removed. Please show a simple snippet of C# that accomplishes this.
推荐答案
这应该做的(而且将大文件拷贝)。
This should do (and will copy with large files).
请注意,它不仅能消除重复的连续的行,即
Note that it only removes duplicate consecutive lines, i.e.
a
b
b
c
b
d
将最终为
a
b
c
b
d
如果你想在任何地方没有重复,你需要保持一组线,你已经看到了
If you want no duplicates anywhere, you'll need to keep a set of lines you've already seen.
using System;
using System.IO;
class DeDuper
{
static void Main(string[] args)
{
if (args.Length != 2)
{
Console.WriteLine("Usage: DeDuper <input file> <output file>");
return;
}
using (TextReader reader = File.OpenText(args[0]))
using (TextWriter writer = File.CreateText(args[1]))
{
string currentLine;
string lastLine = null;
while ((currentLine = reader.ReadLine()) != null)
{
if (currentLine != lastLine)
{
writer.WriteLine(currentLine);
lastLine = currentLine;
}
}
}
}
}
请注意,这里假设 Encoding.UTF8
,以及您希望使用的文件。这很容易概括,就像一个方法:
Note that this assumes Encoding.UTF8
, and that you want to use files. It's easy to generalize as a method though:
static void CopyLinesRemovingConsecutiveDupes
(TextReader reader, TextWriter writer)
{
string currentLine;
string lastLine = null;
while ((currentLine = reader.ReadLine()) != null)
{
if (currentLine != lastLine)
{
writer.WriteLine(currentLine);
lastLine = currentLine;
}
}
}
(注意,没有按' 。T保持紧密的东西 - 方应做到这一点)
(Note that that doesn't close anything - the caller should do that.)
下面是一个版本,将删除的所有的重复,而不仅仅是连续的:
Here's a version that will remove all duplicates, rather than just consecutive ones:
static void CopyLinesRemovingAllDupes(TextReader reader, TextWriter writer)
{
string currentLine;
HashSet<string> previousLines = new HashSet<string>();
while ((currentLine = reader.ReadLine()) != null)
{
// Add returns true if it was actually added,
// false if it was already there
if (previousLines.Add(currentLine))
{
writer.WriteLine(currentLine);
}
}
}
这篇关于删除重复的行从文本文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!