如何使用 vb.net 比较百分比匹配的字符串? [英] How to compare Strings for Percentage Match using vb.net?

查看:25
本文介绍了如何使用 vb.net 比较百分比匹配的字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用头撞墙有一段时间了,现在尝试不同的技术.

I am banging my head against the wall for a while now trying different techniques.

它们都运行良好.

我有两个字符串.

我需要比较它们并获得准确的匹配百分比,

I need to compare them and get an exact percentage of match,

即.四分七年前" TO for scor and sevn yeres ago"

ie. "four score and seven years ago" TO "for scor and sevn yeres ago"

嗯,我首先开始比较每个单词和每个单词,跟踪每次命中,并且百分比 = count \ numOfWords.不,没有考虑拼写错误的单词.

Well, I first started by comparing every word to every word, tracking every hit, and percentage = count \ numOfWords. Nope, didn't take into account misspelled words.

("four" <> "for" 即使它很接近)

("four" <> "for" even though it is close)

然后我开始尝试比较每个字符中的每个字符,如果不匹配则增加字符串字符(以计算拼写错误).但是,我会得到错误的命中,因为第一个字符串可以在第二个字符串中包含每个字符,但不是在第二个字符串中的确切顺序.("stuffavail" <> "stu vail"(但它会回来,低百分比,但命中率.9 \ 11 = 81%))

Then I started by trying to compare every char in each char, incrementing the string char if not a match (to count for misspellings). But, I would get false hits because the first string could have every char in the second but not in the exact order of the second. ("stuff avail" <> "stu vail" (but it would come back as such, low percentage, but a hit. 9 \ 11 = 81%))

所以,然后我尝试比较每个字符串中的字符对.如果 string1[i] = string2[k] AND string1[i+1] = string2[k+1],则递增计数,并在不匹配时递增k"(以跟踪拼写错误.for"和四"应该以 75% 的命中率回来.)这似乎也不起作用.它越来越接近,但即使完全匹配,它也只有 94% 的回报.然后当某些东西真的拼错时它真的被搞砸了.(代码在底部)

SO, I then tried comparing PAIRS of chars in each string. If string1[i] = string2[k] AND string1[i+1] = string2[k+1], increment the count, and increment the "k" when it doesn't match (to track mispellings. "for" and "four" should come back with a 75% hit.) That doesn't seem to work either. It is getting closer, but even with an exact match it is only returns 94%. And then it really gets screwed up when something is really misspelled. (Code at the bottom)

有什么想法或方向吗?

代码

count = 0
j = 0
k = 0
While j < strTempName.Length - 2 And k < strTempFile.Length - 2
    ' To ignore non letters or digits '
    If Not strTempName(j).IsLetter(strTempName(j)) Then
        j += 1
    End If

    ' To ignore non letters or digits '
    If Not strTempFile(k).IsLetter(strTempFile(k)) Then
        k += 1
    End If

    ' compare pair of chars '
    While (strTempName(j) <> strTempFile(k) And _ 
           strTempName(j + 1) <> strTempFile(k + 1) And _ 
           k < strTempFile.Length - 2)
        k += 1
    End While
    count += 1
    j += 1
    k += 1

End While

perc = count / (strTempName.Length - 1)

推荐答案

编辑:我一直在做一些研究,我想我最初是从 这里 几年前将其翻译成 vbnet.它使用 Levenshtein 字符串匹配算法.

Edit: I have been doing some research and I think I initially found the code from here and translated it to vbnet years ago. It uses the Levenshtein string matching algorithm.

这是我使用的代码,希望对您有所帮助:

Here is the code I use for that, hope it helps:

Sub Main()
    Dim string1 As String = "four score and seven years ago"
    Dim string2 As String = "for scor and sevn yeres ago"
    Dim similarity As Single =
        GetSimilarity(string1, string2)
    ' RESULT : 0.8
End Sub

Public Function GetSimilarity(string1 As String, string2 As String) As Single
    Dim dis As Single = ComputeDistance(string1, string2)
    Dim maxLen As Single = string1.Length
    If maxLen < string2.Length Then
        maxLen = string2.Length
    End If
    If maxLen = 0.0F Then
        Return 1.0F
    Else
        Return 1.0F - dis / maxLen
    End If
End Function

Private Function ComputeDistance(s As String, t As String) As Integer
    Dim n As Integer = s.Length
    Dim m As Integer = t.Length
    Dim distance As Integer(,) = New Integer(n, m) {}
    ' matrix
    Dim cost As Integer = 0
    If n = 0 Then
        Return m
    End If
    If m = 0 Then
        Return n
    End If
    'init1

    Dim i As Integer = 0
    While i <= n
        distance(i, 0) = System.Math.Max(System.Threading.Interlocked.Increment(i), i - 1)
    End While
    Dim j As Integer = 0
    While j <= m
        distance(0, j) = System.Math.Max(System.Threading.Interlocked.Increment(j), j - 1)
    End While
    'find min distance

    For i = 1 To n
        For j = 1 To m
            cost = (If(t.Substring(j - 1, 1) = s.Substring(i - 1, 1), 0, 1))
            distance(i, j) = Math.Min(distance(i - 1, j) + 1, Math.Min(distance(i, j - 1) + 1, distance(i - 1, j - 1) + cost))
        Next
    Next
    Return distance(n, m)
End Function

这篇关于如何使用 vb.net 比较百分比匹配的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆