快速比较的两个巨大的文本文件的内容 [英] comparing the contents of two huge text files quickly

查看:281
本文介绍了快速比较的两个巨大的文本文件的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么,我基本上试图做的是比较两个巨大的文本文件,如果匹配,写出来的字符串,我这个写,但它是非常缓慢的。我希望你们可能有一个更好的主意。在下面的例子中,我比较收集[3] splitfound [0]



 的String [] = collectionlist File.ReadAllLines(@ C:\found.txt); 
的String [] = foundlist File.ReadAllLines(@C:\collection_export.txt);
的foreach(在foundlist找到字符串)
{
的String [] = spltifound found.Split('|');
串matchfound = spltifound [0] .Replace(,名为txt); ;
的foreach(字符串收集collectionlist)
{
的String [] = splitcollect collect.Split('\\');
串matchcollect = splitcollect [3] .Replace(,TXT);
如果(matchcollect == matchfound)
{
端++;
长finaldest =(开始 - 结束);
Console.WriteLine(finaldest);
File.AppendAllText(@C:\copy.txt,复制\+ +收集\\C:\\OUT\\+ spltifound [ 1〕+\\+ spltifound [0] +.txt\\\\
);
中断;
}



}

}

很抱歉的含糊不清的家伙,



我想要做的就是简单地说,如果从一个文件中的内容写入另一个存在出的字符串(字符串并不重要,仅是寻找两个对比是时间)。 collectionlist是这样的:结果
苹果|农场



foundlist是这样的结果
C:\cow\horse\\ \\turtle.txt结果
C:\cow\pig\apple.txt



我在做什么是由collectionlist服用苹果,并发现含有foundlist苹果就行了。然后写了一个基本的Windows拷贝批处理文件。很抱歉的混乱。



答(所有信贷Slaks)

 字符串[] = foundlist File.ReadAllLines(@C:\found.txt); 
变种集合= File.ReadLines(@C:\collection_export.txt)
.ToDictionary(S = GT; s.Split('|')[0] .Replace(名为txt。 ,));

使用(VAR作家=新的StreamWriter(@C:\Copy.txt))
{
的foreach(在foundlist找到字符串)
$ { b $ b的String [] = splitFound found.Split('\\');
串matchFound = Path.GetFileNameWithoutExtension(发现);

串collectedLine;
如果(collection.TryGetValue(matchFound,出collectedLine))
{
的String [] = collectlinesplit collectedLine.Split('|');
端++;
长finaldest =(开始 - 结束);
Console.WriteLine(finaldest);
writer.WriteLine(复制\+ +中找到\\C:\\O\\+ collectlinesplit [1] +\\+ collectlinesplit [0] +.txt\);
}
}
}


解决方案

  • 呼叫的 File.ReadLines()(.NET 4)代替的 ReadAllLines()(.NET 2.0)。结果
    ReadAllLines 需要建立一个数组来保存返回值,这对于大文件非常慢。结果
    。如果你不使用.NET 4.0,拥有一个StreamReader替换它。


  • 构建词典<字符串,字符串> matchCollect S(一次),然后循环通过 foundList 和检查的HashSet是否包含 matchFound 。结果
    这可以让你用O更换O(n)的内环(1)哈希检查


  • 使用,而不是调用的StreamWriter AppendText通过


  • 修改:呼叫 Path.GetFileNameWithoutExtension 和其他路径方法,而不是手动操作字符串。




有关例如:

  VAR集合= File.ReadLines(@C:\found.txt)
.ToDictionary (S => s.Split('\\')[3] .Replace(,TXT。));

使用(VAR作家=新的StreamWriter(@C:\Copy.txt)){
的foreach(在foundlist找到字符串){
串splitFound =找到。斯普利特('|');
串matchFound = Path.GetFileNameWithoutExtension(发现)

串collectedLine;
如果(collection.TryGetValue(matchFound,collectedLine)){
端++;
长finaldest =(开始 - 结束);
Console.WriteLine(finaldest);
writer.WriteLine(复制\+ collectedLine +\\C:\\OUT\\
+ splitFound [1] +\\\ \\+ spltifound [0] +.txt\);
}
}
}


what i'm basically trying to do is compare two HUGE text files and if they match write out a string, i have this written but it's extremely slow. I was hoping you guys might have a better idea. In the below example i'm comparing collect[3] splitfound[0]

        string[] collectionlist = File.ReadAllLines(@"C:\found.txt");
        string[] foundlist = File.ReadAllLines(@"C:\collection_export.txt");
        foreach (string found in foundlist)
        {
            string[] spltifound = found.Split('|');
            string matchfound = spltifound[0].Replace(".txt", ""); ;
            foreach (string collect in collectionlist)
            {
                string[] splitcollect = collect.Split('\\');
                string matchcollect = splitcollect[3].Replace(".txt", "");
                if (matchcollect == matchfound)
                {
                    end++;
                   long finaldest = (start - end);
                   Console.WriteLine(finaldest);
                    File.AppendAllText(@"C:\copy.txt", "copy \"" + collect + "\" \"C:\\OUT\\" + spltifound[1] + "\\" + spltifound[0] + ".txt\"\n");
                    break;
                }



            }

        }

Sorry for the vagueness guys,

What I'm trying to do is simply say if content from one file exists in another write out a string(the string isn't important, merely the time to find the two comparatives is). collectionlist is like this:
Apple|Farm

foundlist is like this
C:\cow\horse\turtle.txt
C:\cow\pig\apple.txt

what i'm doing is taking apple from collectionlist, and finding the line that contains apple in foundlist. Then writing out a basic windows copy batch file. Sorry for the confusion.

Answer(All credit to Slaks)

               string[] foundlist = File.ReadAllLines(@"C:\found.txt");
           var collection = File.ReadLines(@"C:\collection_export.txt")
        .ToDictionary(s => s.Split('|')[0].Replace(".txt",""));

        using (var writer = new StreamWriter(@"C:\Copy.txt"))
        {
            foreach (string found in foundlist)
            {
                string[] splitFound = found.Split('\\');
                string matchFound = Path.GetFileNameWithoutExtension(found);

                string collectedLine;
                if (collection.TryGetValue(matchFound,out collectedLine))
                {
                    string[] collectlinesplit = collectedLine.Split('|');
                    end++;
                    long finaldest = (start - end);
                    Console.WriteLine(finaldest);
                    writer.WriteLine("copy \"" + found + "\" \"C:\\O\\" + collectlinesplit[1] + "\\" + collectlinesplit[0] + ".txt\"");
                }
            }
        }

解决方案

  • Call File.ReadLines() (.NET 4) instead of ReadAllLines() (.NET 2.0).
    ReadAllLines needs to build an array to hold the return value, which can be extremely slow for large files.
    If you're not using .Net 4.0, replace it with a StreamReader.

  • Build a Dictionary<string, string> with the matchCollects (once), then loop through the foundList and check whether the HashSet contains matchFound.
    This allows you to replace the O(n) inner loop with an O(1) hash check

  • Use a StreamWriter instead of calling AppendText

  • EDIT: Call Path.GetFileNameWithoutExtension and the other Path methods instead of manually manipulating strings.

For example:

var collection = File.ReadLines(@"C:\found.txt")
    .ToDictionary(s => s.Split('\\')[3].Replace(".txt", ""));

using (var writer = new StreamWriter(@"C:\Copy.txt")) {
    foreach (string found in foundlist) {
        string splitFound = found.Split('|');
        string matchFound = Path.GetFileNameWithoutExtension(found)

        string collectedLine;
        if (collection.TryGetValue(matchFound, collectedLine)) {
            end++;
            long finaldest = (start - end);
            Console.WriteLine(finaldest);
            writer.WriteLine("copy \"" + collectedLine + "\" \"C:\\OUT\\" 
                           + splitFound[1] + "\\" + spltifound[0] + ".txt\"");
        }
    }
}

这篇关于快速比较的两个巨大的文本文件的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆