编码挑战:智能地比较名称 [英] Coding challenge: comparing names intellligently

查看:89
本文介绍了编码挑战:智能地比较名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个简单的建议由Brent Hoskisson建议。



创建一个智能比较两个名称的方法。 智能意味着它需要考虑名称可能采用的不同形式。例如:

 John Paul Smith 



符合

 John Paul Smith 
Smith John Paul
John P Smith
Smith John P
J Paul Smith
Smith J Paul
John Smith
Smith John



您是否选择使用二进制匹配/不匹配,或者表示匹配确定程度的分数取决于您。



因简洁代码而获得的积分。每个挑战者的不同参赛人数没有限制。



注意:下周我星期五离开。有花花公子画廊的人请在下周五(3月24日)发布挑战

解决方案

让我们首先考虑并指定要求,在这种情况下使用示例规范,直接编码为可执行单元测试:

 [TestMethod ] 
public void Test_NameMatches()
{
Assert.IsTrue ( John Paul Smith .Matches( 史密斯约翰)); // 1
Assert.IsTrue( Smith John .Matches( Smith约翰)); // 2
Assert.IsTrue( Smith Smith .Matches( Smith史密斯)); // 3
Assert.IsTrue( Smith Smith .Matches( Smith Paul Smith)); // 4
Assert.IsTrue(< span class =code-string>
Smith John .Matches( < span class =code-string> john S
)); // 5
Assert.IsTrue( J Smith .Matches( John史密斯)); // 6
Assert.IsTrue( J Paul Smith .Matches( 约翰)); // 7
Assert.IsTrue( Smith,JP .Matches( 约翰保罗史密斯)); // 8
Assert.IsTrue( John Smith .Matches( John SM)); // 9

Assert.IsFalse( John Jonsson .Matches( John Smith)); // 10
Assert.IsFalse( John Smith .Matches( John琼森)); // 11
}

请注意我已包含案例重复名称(3和4),缩写(6-8),不区分大小写(2和5),额外白色空格(4,7)和标点符号(8)。我还选择匹配不完整的名称,例如案例9,以支持自动完成方案。为了代码可读性,我选择使用 String 扩展方法。



满足上述要求的简洁解决方案

 静态 名称
{
public static bool 匹配( 字符串 name1,字符串 name2)
{
var names1 = name1.Split(Separators,StringSplitOptions.RemoveEmptyEntries);
var names2 = name2.Split(Separators,StringSplitOptions.RemoveEmptyEntries);
return names1.Length < names2.Length? !names1.Except(names2,Comparer).Any():! names2.Except(names1,Comparer).Any();
}

public static char [] Separators = {' '' \t'' 。'' ,'};
}



为了比较,它使用 Comparer ,它被定义为嵌套类的实例:

 私人 静态  readonly  NameComparer Comparer =  new  NameComparer(); 

private class NameComparer:IEqualityComparer< String>
{
public bool 等于( string x, string y)
{
return < span class =code-sdkkeyword> String .Compare(x, 0 ,y, 0 ,Math.Min(x.Length,y.Length),ignoreCase: true )== 0 ;
return String .Compare(x, 0 ,y, 0 ,x.Length,ignoreCase: true )== 0 ||
字符串 .Compare(x, 0 ,y, 0 ,y.Length,ignoreCase: true )== 0 ;
}

public int GetHashCode( string obj)
{
return Char.ToUpper(obj [ 0 ])的GetHashCode();
}
}

为了支持单个名称的匹配,我还添加了以下方法,该方法将整个集合作为输入并返回所有匹配:

  public   static  IEnumerable< String> GetAllMatches( this   String  name1,IEnumerable< String> dictionary)
{
< span class =code-keyword> var names1 = name1.Split(Separators,StringSplitOptions.RemoveEmptyEntries);
foreach var name2 in 字典)
{
var names2 = name2.Split(Separators,StringSplitOptions.RemoveEmptyEntries);
if (names1.Length < names2.Length?!names1.Except(names2, Comparer).Any():! names2.Except(names1,Comparer).Any())
{
yield return name2;
}
}



是否通过同行评审


对于这个挑战,我也有一个想法。在我看来,这种方法的答案不能是真或假 - 它必须是一个值,它给出了比赛的好坏比例......



1st方法:

 功能 NameCompare(SourceName  As  字符串,NameToCompare 作为 字符串 As   Single  
' 规则:
' 空字符串给出0.0作为结果
' 两个字符串是否相同,结果是1.0 = 100%匹配
< span class =code-comment>' 其他情况:
' - 我从字母组件开头的每个字母给出1分,从开头开始
' - 如果Stringpart完全相等,我给出一个额外的点
' - 如果Stringpart完全相等并且位于正确的位置,我会给出一个额外的点
' - 如果NameToCompare以相同的字母开头但具有多于SourceName,则每个字母更多地减少给定的点

如果 SourceName.Trim = NameToCompare.Trim = 然后 < span class =code-keyword>返回 0 0

Dim SourceNameArray() As String = SourceName.Split(
Dim NameToCompareArray()作为 字符串 = NameToCompare.Split(

Dim maxPoints As 整数 = SourceName.Replace( )。Length + SourceNameArray.Length * 2
Dim givenPoints As = 0 0
Dim i,j,k 作为 整数

对于 i = 0 SourceNameArray.Length - 1
对于 j = 0 NameToCompareArray.Length - 1
如果 SourceNameArray(i)= NameToCompareArray(j)那么
givenPoints + = NameToCompareArray(j ).Length
givenPoints + = 1
如果 i = j 然后 givenPoints + = 1
退出 对于
ElseIf SourceNameArray(i).Length> = NameToCompareArray(j).Length Then
对于 k = 1 NameToCompareArray(j).Length
If SourceNameArray(i).Substring( 0 ,k)= NameToCompareArray(j).Substring( 0 ,k)然后 givenPoints + = < span class =code-digit> 1
Next
ElseIf SourceNameArray(i).Length< NameToCompareArray(j).Length 然后
对于 k = 1 SourceNameArray(i).Length
If SourceNameArray(i ).Substring( 0 ,k)= NameToCompareArray(j).Substring( 0 ,k)然后 givenPoints + = 1
下一页
givenPoints - = CSng (NameToCompareArray(j).Length - SourceNameArray(i).Length)* 0 . 25
结束 如果
下一步
下一步

<温泉n class =code-keyword>如果 givenPoints = 0 maxPoints = 0 然后 返回 0 0
返回 givenPoints / CSng (maxPoints)
结束 功能





现在用多个名称进行比较的测试:

  Dim  n0,nc  As   String  
n0 = John Paul Smith

nc = John Paul Smith
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = Smith John Paul
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = John P Smith
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = 史密斯John P
Console.WriteLine(n0 + < span class =code-string> :: + nc + < span class =code-string> - > + NameCompare(n0,nc).ToString( 0.00))
nc = J Paul Smith
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = Smith J Paul
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = John史密斯
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = Smith John
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00 ))
nc = JP史密斯
控制台.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = Paul Smith
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = Smith Paul
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00 ))

nc = Paula Smith
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = Josephine Smith
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = Johny Smith
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00 ))





和结果:

 John Paul Smith :: John Paul Smith  - > 1,00 
John Paul Smith :: Smith John Paul - > 0,82
John Paul Smith :: John P Smith - > 0,72
John Paul Smith :: Smith John P - > 0,61
John Paul Smith :: J Paul Smith - > 0,72
John Paul Smith :: Smith J Paul - > 0, 61
John Paul Smith :: John Smith - > 0,62
John Paul Smith :: Smith John - > 0,55
John Paul Smith :: JP Smith - > 0 ,45
约翰保罗史密斯::保罗史密斯 - > 0,57
约翰保罗史密斯::史密斯保罗 - > 0,61
John Paul Smith :: Paula Smith - > 0,47
John Paul Smith :: Josephine Smith - > 0,21
John Paul Smith :: Johny Smith - > 0,47


This simple one is suggested by Brent Hoskisson.

Create a method that will compare two names intelligently. "Intelligently" means that it needs to take into account the different forms a name may take. For example:

John Paul Smith


Would match with

John Paul Smith
Smith John Paul
John P Smith
Smith John P
J Paul Smith
Smith J Paul
John Smith
Smith John


Whether you choose to use a binary match/no match, or a score indicating a degree of certainty of the match is up to you.

Points awarded for brevity of code. No restrictions on the number of different entries per challenger.

Note: next week I'm away Friday. Can someone in the peanut gallery please post a challenge next Friday (24 March)

解决方案

Let us first think about and specifying the requirements, in this case using Specification by Example, directly coded as executable unit tests:

[TestMethod]
public void Test_NameMatches()
{
    Assert.IsTrue("John Paul Smith".Matches("Smith John"));    // 1
    Assert.IsTrue("Smith John".Matches("Smith john"));         // 2   
    Assert.IsTrue("Smith Smith".Matches("Smith Smith"));       // 3  
    Assert.IsTrue("Smith Smith".Matches(" Smith Paul Smith "));// 4
    Assert.IsTrue("Smith John".Matches("john S"));             // 5
    Assert.IsTrue("J Smith".Matches("John Smith" ));           // 6
    Assert.IsTrue(" J Paul  Smith ".Matches("John"));          // 7 
    Assert.IsTrue("Smith, J.P".Matches("John Paul Smith"));    // 8
    Assert.IsTrue("John Smith".Matches("John Sm"));            // 9

    Assert.IsFalse("John Jonsson".Matches("John Smith"));      // 10
    Assert.IsFalse("John Smith".Matches( "John Jonsson"));     // 11
}

Notice here that I have included case with duplicate names (3 and 4), abbreviations (6-8) , case insensitivity (2 and 5), extra white spaces (4, 7) and punctuations (8). I also chose to match incomplete names such as case 9 to support auto-completion scenarios. For code readability I chose to go for a String extension method.

A concise solution that fulfills the above requirements is

static class Names
{
    public static bool Matches(this String name1, String name2)
    {
       var names1 = name1.Split(Separators, StringSplitOptions.RemoveEmptyEntries);
       var names2 = name2.Split(Separators, StringSplitOptions.RemoveEmptyEntries);
       return names1.Length < names2.Length ? !names1.Except(names2, Comparer).Any() : !names2.Except(names1, Comparer).Any();
    }

    public static char[] Separators = { ' ', '\t', '.', ',' };
}


For comparison it utilizes Comparer which is defined as an instance of a nested class:

private static readonly NameComparer Comparer = new NameComparer();

    private class NameComparer : IEqualityComparer<String>
    {
        public bool Equals(string x, string y)
        {
            return String.Compare(x, 0, y, 0, Math.Min(x.Length, y.Length), ignoreCase: true) == 0;
            return String.Compare(x, 0, y, 0, x.Length, ignoreCase:true) == 0 ||
                   String.Compare(x, 0, y, 0, y.Length, ignoreCase:true) == 0;        }

        public int GetHashCode(string obj)
        {
            return Char.ToUpper(obj[0]).GetHashCode();
        }
    }

To support matching of a single name I also added the following method which takes a whole collection as input and returns all matches:

public static IEnumerable<String> GetAllMatches(this String name1, IEnumerable<String>  dictionary)
    {
       var names1 = name1.Split(Separators, StringSplitOptions.RemoveEmptyEntries);
       foreach (var name2 in dictionary)
       {
          var names2 = name2.Split(Separators, StringSplitOptions.RemoveEmptyEntries);
          if (names1.Length < names2.Length ? !names1.Except(names2, Comparer).Any() : !names2.Except(names1, Comparer).Any())
          {
              yield return name2;
          }
     }


Does it passes the peer review?


To this challenge I have also an idea. In my opinion the answer of such method could not be "True" or "False" - it must be a value which gives a ratio how good the match is ...

1st the method :

Function NameCompare(SourceName As String, NameToCompare As String) As Single
     'Rules :
     'An empty String gives 0.0 as Result
     'Are both String identical the Result is 1.0 = 100% match
     'other cases :
     ' - I give 1 point for each letter which is equal in the Stringparts beginning from the start
     ' - I give an additional point if the Stringpart is complete equal
     ' - I give an additional point if the Stringpart is complete equal and at the right position
     ' - if the NameToCompare beginns with the same letters but have more than the SourceName then each letter more reduces the given points

     If SourceName.Trim = "" Or NameToCompare.Trim = "" Then Return 0.0

     Dim SourceNameArray() As String = SourceName.Split(" ")
     Dim NameToCompareArray() As String = NameToCompare.Split(" ")

     Dim maxPoints As Integer = SourceName.Replace(" ", "").Length + SourceNameArray.Length * 2
     Dim givenPoints As Single = 0.0
     Dim i, j, k As Integer

     For i = 0 To SourceNameArray.Length - 1
         For j = 0 To NameToCompareArray.Length - 1
             If SourceNameArray(i) = NameToCompareArray(j) Then
                 givenPoints += NameToCompareArray(j).Length
                 givenPoints += 1
                 If i = j Then givenPoints += 1
                 Exit For
             ElseIf SourceNameArray(i).Length >= NameToCompareArray(j).Length Then
                 For k = 1 To NameToCompareArray(j).Length
                     If SourceNameArray(i).Substring(0, k) = NameToCompareArray(j).Substring(0, k) Then givenPoints += 1
                 Next
             ElseIf SourceNameArray(i).Length < NameToCompareArray(j).Length Then
                 For k = 1 To SourceNameArray(i).Length
                     If SourceNameArray(i).Substring(0, k) = NameToCompareArray(j).Substring(0, k) Then givenPoints += 1
                 Next
                 givenPoints -= CSng(NameToCompareArray(j).Length - SourceNameArray(i).Length) * 0.25
             End If
         Next
     Next

     If givenPoints = 0 Or maxPoints = 0 Then Return 0.0
     Return givenPoints / CSng(maxPoints)
 End Function



now the test with severall names to compare :

Dim n0, nc As String
n0 = "John Paul Smith"

nc = "John Paul Smith"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "Smith John Paul"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "John P Smith"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "Smith John P"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "J Paul Smith"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "Smith J Paul"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "John Smith"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "Smith John"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "J P Smith"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "Paul Smith"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "Smith Paul"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))

nc = "Paula Smith"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "Josephine Smith"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
nc = "Johny Smith"
Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))



and here the results :

John Paul Smith :: John Paul Smith -> 1,00
John Paul Smith :: Smith John Paul -> 0,82
John Paul Smith :: John P Smith -> 0,72
John Paul Smith :: Smith John P -> 0,61
John Paul Smith :: J Paul Smith -> 0,72
John Paul Smith :: Smith J Paul -> 0,61
John Paul Smith :: John Smith -> 0,62
John Paul Smith :: Smith John -> 0,55
John Paul Smith :: J P Smith -> 0,45
John Paul Smith :: Paul Smith -> 0,57
John Paul Smith :: Smith Paul -> 0,61
John Paul Smith :: Paula Smith -> 0,47
John Paul Smith :: Josephine Smith -> 0,21
John Paul Smith :: Johny Smith -> 0,47


这篇关于编码挑战:智能地比较名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆