C#code或算法可以快速计算出大串之间的距离是多少? [英] C# code or algorithm to quickly calculate distance between large strings?
问题描述
Hi和感谢寻找!
我有自己含有连接约3400字符codeD数据的字符串的XML文件,其中包含1900节点。
I have an XML file that contains 1900 nodes which themselves contain a string of encoded data of about 3400 characters.
作为一个使用案例,我开发一个应用程序的一部分,我需要能够采取基准的字符串在运行时,并找到的最接近的XML文件中的比赛。
As part of a use case for an application I am developing, I need to be able to take a "benchmark" string at runtime, and find the closest match from the XML file.
请注意,XML不是有密切关系的应用程序,我可以与SQL向前走,但今天,我只需要一个简单的地方来存储数据,并证明这个概念。
Please note that XML is not germane to the app and that I may go with SQL moving forward, but for today, I just needed an easy place to store the data and prove the concept.
我使用.NET 4.0,C#,窗体应用程序,LINQ等等。
I am using .NET 4.0, C#, forms app, LINQ, etc.
如何找到最匹配?海明?莱文斯坦?有很多code样品在线,但大多数是面向小字符串比较(蚁族与姨妈)或精确匹配。我很少需要的确切的匹配;我只是需要的最接近的匹配。
How do I find the closest match? Hamming? Levenshtein? There are plenty of code samples online, but most are geared towards small string comparisons ("ant" vs. "aunt") or exact match. I will rarely have exact matches; I just need closest match.
在此先感谢!
太
推荐答案
您提到了使用的 Levenhstein的编辑距离的,而你的字符串大约3400个字符。
You mentioned using Levenhstein's Edit Distance and that your strings were about 3400 characters long.
我做了一个快速尝试并使用的动态规划的版本Levenhstein的编辑距离它似乎是相当快,导致没有问题。
I did a quick try and using the dynamic programming version of Levenhstein's Edit Distance it seems to be quite fast and cause no issue.
我这样做:
final StringBuilder sb1 = new StringBuilder();
final StringBuilder sb2 = new StringBuilder();
final Random r = new Random(42);
final int n = 3400;
for (int i = 0; i < n; i++) {
sb1.append( (char) ('a' + r.nextInt(26)) );
sb2.append( (char) ('a' + r.nextInt(26)) );
}
final long t0 = System.currentTimeMillis();
System.out.println("LED: " + getLevenshteinDistance(sb1.toString(), sb2.toString()) );
final long te = System.currentTimeMillis() - t0;
System.out.println("Took: " + te + " ms");
和它的结论,从2006年左右就可以了酷睿2在215毫秒的距离。
And it's finding the distance in 215 ms on a Core 2 Duo from 2006 or so.
会为你工作?
(顺便说一句,我不知道我可以贴code为DP LED实现我得在这里,所以你应该在网上搜索一个Java实现)的
这篇关于C#code或算法可以快速计算出大串之间的距离是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!