编码挑战:智能地比较名称 [英] Coding challenge: comparing names intellligently
问题描述
这个简单的建议由Brent Hoskisson建议。
创建一个智能比较两个名称的方法。 智能意味着它需要考虑名称可能采用的不同形式。例如:
John Paul Smith
符合
John Paul Smith
Smith John Paul
John P Smith
Smith John P
J Paul Smith
Smith J Paul
John Smith
Smith John
您是否选择使用二进制匹配/不匹配,或者表示匹配确定程度的分数取决于您。
因简洁代码而获得的积分。每个挑战者的不同参赛人数没有限制。
注意:下周我星期五离开。有花花公子画廊的人请在下周五(3月24日)发布挑战
让我们首先考虑并指定要求,在这种情况下使用示例规范,直接编码为可执行单元测试:
[TestMethod ]
public void Test_NameMatches()
{
Assert.IsTrue ( John Paul Smith .Matches( 史密斯约翰)); // 1
Assert.IsTrue( Smith John .Matches( Smith约翰跨度>)); // 2
Assert.IsTrue( Smith Smith .Matches( Smith史密斯跨度>)); // 3
Assert.IsTrue( Smith Smith .Matches( Smith Paul Smith)); // 4
Assert.IsTrue(< span class =code-string> Smith John .Matches( < span class =code-string> john S)); // 5
Assert.IsTrue( J Smith .Matches( John史密斯)); // 6
Assert.IsTrue( J Paul Smith .Matches( 约翰跨度>)); // 7
Assert.IsTrue( Smith,JP .Matches( 约翰保罗史密斯)); // 8
Assert.IsTrue( John Smith .Matches( John SM跨度>)); // 9
Assert.IsFalse( John Jonsson .Matches( John Smith)); // 10
Assert.IsFalse( John Smith .Matches( John琼森跨度>)); // 11
}请注意我已包含案例重复名称(3和4),缩写(6-8),不区分大小写(2和5),额外白色空格(4,7)和标点符号(8)。我还选择匹配不完整的名称,例如案例9,以支持自动完成方案。为了代码可读性,我选择使用
String
扩展方法。
满足上述要求的简洁解决方案
静态 类名称
{
public static bool 匹配(此 字符串 name1,字符串 name2)
{
var names1 = name1.Split(Separators,StringSplitOptions.RemoveEmptyEntries);
var names2 = name2.Split(Separators,StringSplitOptions.RemoveEmptyEntries);
return names1.Length < names2.Length? !names1.Except(names2,Comparer).Any():! names2.Except(names1,Comparer).Any();
}
public static char [] Separators = {' ',' \t',' 。',' ,'};
}
为了比较,它使用Comparer
,它被定义为嵌套类的实例:
私人 静态 readonly NameComparer Comparer = new NameComparer();
private class NameComparer:IEqualityComparer< String>
{
public bool 等于( string x, string y)
{
return < span class =code-sdkkeyword> String .Compare(x, 0 ,y, 0 ,Math.Min(x.Length,y.Length),ignoreCase: true )== 0 ;
return String .Compare(x, 0 ,y, 0 ,x.Length,ignoreCase: true )== 0 ||}
字符串 .Compare(x, 0 ,y, 0 ,y.Length,ignoreCase: true )== 0 ;
public int GetHashCode( string obj)
{
return Char.ToUpper(obj [ 0 跨度>])的GetHashCode();
}
}为了支持单个名称的匹配,我还添加了以下方法,该方法将整个集合作为输入并返回所有匹配:
public static IEnumerable< String> GetAllMatches( this String name1,IEnumerable< String> dictionary)
{
< span class =code-keyword> var names1 = name1.Split(Separators,StringSplitOptions.RemoveEmptyEntries);
foreach ( var name2 in 字典)
{
var names2 = name2.Split(Separators,StringSplitOptions.RemoveEmptyEntries);
if (names1.Length < names2.Length?!names1.Except(names2, Comparer).Any():! names2.Except(names1,Comparer).Any())
{
yield return name2;
}
}
是否通过同行评审?
对于这个挑战,我也有一个想法。在我看来,这种方法的答案不能是真或假 - 它必须是一个值,它给出了比赛的好坏比例......
1st方法:功能 NameCompare(SourceName As 字符串,NameToCompare 作为 字符串) As Single
' 规则:
' 空字符串给出0.0作为结果
' 两个字符串是否相同,结果是1.0 = 100%匹配
< span class =code-comment>' 其他情况:
' - 我从字母组件开头的每个字母给出1分,从开头开始
' - 如果Stringpart完全相等,我给出一个额外的点
' - 如果Stringpart完全相等并且位于正确的位置,我会给出一个额外的点
' - 如果NameToCompare以相同的字母开头但具有多于SourceName,则每个字母更多地减少给定的点
如果 SourceName.Trim = 或 NameToCompare.Trim = 然后 < span class =code-keyword>返回 0 。 0
Dim SourceNameArray() As String = SourceName.Split( )
Dim NameToCompareArray()作为 字符串 = NameToCompare.Split( )
Dim maxPoints As 整数 = SourceName.Replace( , )。Length + SourceNameArray.Length * 2
Dim givenPoints As 单 = 0 。 0
Dim i,j,k 作为 整数
对于 i = 0 至 SourceNameArray.Length - 1
对于 j = 0 NameToCompareArray.Length - 1
如果 SourceNameArray(i)= NameToCompareArray(j)那么
givenPoints + = NameToCompareArray(j ).Length
givenPoints + = 1
如果 i = j 然后 givenPoints + = 1
退出 对于
ElseIf SourceNameArray(i).Length> = NameToCompareArray(j).Length Then
对于 k = 1 NameToCompareArray(j).Length
If SourceNameArray(i).Substring( 0 ,k)= NameToCompareArray(j).Substring( 0 ,k)然后 givenPoints + = < span class =code-digit> 1
Next
ElseIf SourceNameArray(i).Length< NameToCompareArray(j).Length 然后
对于 k = 1 SourceNameArray(i).Length
If SourceNameArray(i ).Substring( 0 ,k)= NameToCompareArray(j).Substring( 0 ,k)然后 givenPoints + = 1
下一页
givenPoints - = CSng (NameToCompareArray(j).Length - SourceNameArray(i).Length)* 0 . 25
结束 如果
下一步
下一步
<温泉n class =code-keyword>如果 givenPoints = 0 或 maxPoints = 0 然后 返回 0 。 0
返回 givenPoints / CSng (maxPoints)
结束 功能
现在用多个名称进行比较的测试:Dim n0,nc As String
n0 = John Paul Smith
nc = John Paul Smith
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = Smith John Paul
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = John P Smith
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = 史密斯John P
Console.WriteLine(n0 + < span class =code-string> :: + nc + < span class =code-string> - > + NameCompare(n0,nc).ToString( 0.00))
nc = J Paul Smith
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = Smith J Paul
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = John史密斯
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = Smith John
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00 ))
nc = JP史密斯
控制台.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = Paul Smith
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = Smith Paul
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00 ))
nc = Paula Smith
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = Josephine Smith
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00))
nc = Johny Smith
Console.WriteLine(n0 + :: + nc + - > + NameCompare(n0,nc).ToString( 0.00 ))
和结果:John Paul Smith :: John Paul Smith - > 1,00
John Paul Smith :: Smith John Paul - > 0,82
John Paul Smith :: John P Smith - > 0,72
John Paul Smith :: Smith John P - > 0,61
John Paul Smith :: J Paul Smith - > 0,72
John Paul Smith :: Smith J Paul - > 0, 61
John Paul Smith :: John Smith - > 0,62
John Paul Smith :: Smith John - > 0,55
John Paul Smith :: JP Smith - > 0 ,45
约翰保罗史密斯::保罗史密斯 - > 0,57
约翰保罗史密斯::史密斯保罗 - > 0,61
John Paul Smith :: Paula Smith - > 0,47
John Paul Smith :: Josephine Smith - > 0,21
John Paul Smith :: Johny Smith - > 0,47
This simple one is suggested by Brent Hoskisson.
Create a method that will compare two names intelligently. "Intelligently" means that it needs to take into account the different forms a name may take. For example:
John Paul Smith
Would match with
John Paul Smith
Smith John Paul
John P Smith
Smith John P
J Paul Smith
Smith J Paul
John Smith
Smith John
Whether you choose to use a binary match/no match, or a score indicating a degree of certainty of the match is up to you.
Points awarded for brevity of code. No restrictions on the number of different entries per challenger.
Note: next week I'm away Friday. Can someone in the peanut gallery please post a challenge next Friday (24 March)
Let us first think about and specifying the requirements, in this case using Specification by Example, directly coded as executable unit tests:
[TestMethod] public void Test_NameMatches() { Assert.IsTrue("John Paul Smith".Matches("Smith John")); // 1 Assert.IsTrue("Smith John".Matches("Smith john")); // 2 Assert.IsTrue("Smith Smith".Matches("Smith Smith")); // 3 Assert.IsTrue("Smith Smith".Matches(" Smith Paul Smith "));// 4 Assert.IsTrue("Smith John".Matches("john S")); // 5 Assert.IsTrue("J Smith".Matches("John Smith" )); // 6 Assert.IsTrue(" J Paul Smith ".Matches("John")); // 7 Assert.IsTrue("Smith, J.P".Matches("John Paul Smith")); // 8 Assert.IsTrue("John Smith".Matches("John Sm")); // 9 Assert.IsFalse("John Jonsson".Matches("John Smith")); // 10 Assert.IsFalse("John Smith".Matches( "John Jonsson")); // 11 }Notice here that I have included case with duplicate names (3 and 4), abbreviations (6-8) , case insensitivity (2 and 5), extra white spaces (4, 7) and punctuations (8). I also chose to match incomplete names such as case 9 to support auto-completion scenarios. For code readability I chose to go for a
String
extension method.
A concise solution that fulfills the above requirements is
static class Names { public static bool Matches(this String name1, String name2) { var names1 = name1.Split(Separators, StringSplitOptions.RemoveEmptyEntries); var names2 = name2.Split(Separators, StringSplitOptions.RemoveEmptyEntries); return names1.Length < names2.Length ? !names1.Except(names2, Comparer).Any() : !names2.Except(names1, Comparer).Any(); } public static char[] Separators = { ' ', '\t', '.', ',' }; }
For comparison it utilizesComparer
which is defined as an instance of a nested class:
private static readonly NameComparer Comparer = new NameComparer(); private class NameComparer : IEqualityComparer<String> { public bool Equals(string x, string y) { return String.Compare(x, 0, y, 0, Math.Min(x.Length, y.Length), ignoreCase: true) == 0;return String.Compare(x, 0, y, 0, x.Length, ignoreCase:true) == 0 || String.Compare(x, 0, y, 0, y.Length, ignoreCase:true) == 0;} public int GetHashCode(string obj) { return Char.ToUpper(obj[0]).GetHashCode(); } }To support matching of a single name I also added the following method which takes a whole collection as input and returns all matches:
public static IEnumerable<String> GetAllMatches(this String name1, IEnumerable<String> dictionary) { var names1 = name1.Split(Separators, StringSplitOptions.RemoveEmptyEntries); foreach (var name2 in dictionary) { var names2 = name2.Split(Separators, StringSplitOptions.RemoveEmptyEntries); if (names1.Length < names2.Length ? !names1.Except(names2, Comparer).Any() : !names2.Except(names1, Comparer).Any()) { yield return name2; } }
Does it passes the peer review?
To this challenge I have also an idea. In my opinion the answer of such method could not be "True" or "False" - it must be a value which gives a ratio how good the match is ...
1st the method :Function NameCompare(SourceName As String, NameToCompare As String) As Single 'Rules : 'An empty String gives 0.0 as Result 'Are both String identical the Result is 1.0 = 100% match 'other cases : ' - I give 1 point for each letter which is equal in the Stringparts beginning from the start ' - I give an additional point if the Stringpart is complete equal ' - I give an additional point if the Stringpart is complete equal and at the right position ' - if the NameToCompare beginns with the same letters but have more than the SourceName then each letter more reduces the given points If SourceName.Trim = "" Or NameToCompare.Trim = "" Then Return 0.0 Dim SourceNameArray() As String = SourceName.Split(" ") Dim NameToCompareArray() As String = NameToCompare.Split(" ") Dim maxPoints As Integer = SourceName.Replace(" ", "").Length + SourceNameArray.Length * 2 Dim givenPoints As Single = 0.0 Dim i, j, k As Integer For i = 0 To SourceNameArray.Length - 1 For j = 0 To NameToCompareArray.Length - 1 If SourceNameArray(i) = NameToCompareArray(j) Then givenPoints += NameToCompareArray(j).Length givenPoints += 1 If i = j Then givenPoints += 1 Exit For ElseIf SourceNameArray(i).Length >= NameToCompareArray(j).Length Then For k = 1 To NameToCompareArray(j).Length If SourceNameArray(i).Substring(0, k) = NameToCompareArray(j).Substring(0, k) Then givenPoints += 1 Next ElseIf SourceNameArray(i).Length < NameToCompareArray(j).Length Then For k = 1 To SourceNameArray(i).Length If SourceNameArray(i).Substring(0, k) = NameToCompareArray(j).Substring(0, k) Then givenPoints += 1 Next givenPoints -= CSng(NameToCompareArray(j).Length - SourceNameArray(i).Length) * 0.25 End If Next Next If givenPoints = 0 Or maxPoints = 0 Then Return 0.0 Return givenPoints / CSng(maxPoints) End Function
now the test with severall names to compare :Dim n0, nc As String n0 = "John Paul Smith" nc = "John Paul Smith" Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00")) nc = "Smith John Paul" Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00")) nc = "John P Smith" Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00")) nc = "Smith John P" Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00")) nc = "J Paul Smith" Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00")) nc = "Smith J Paul" Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00")) nc = "John Smith" Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00")) nc = "Smith John" Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00")) nc = "J P Smith" Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00")) nc = "Paul Smith" Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00")) nc = "Smith Paul" Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00")) nc = "Paula Smith" Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00")) nc = "Josephine Smith" Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00")) nc = "Johny Smith" Console.WriteLine(n0 + " :: " + nc + " -> " + NameCompare(n0, nc).ToString("0.00"))
and here the results :John Paul Smith :: John Paul Smith -> 1,00 John Paul Smith :: Smith John Paul -> 0,82 John Paul Smith :: John P Smith -> 0,72 John Paul Smith :: Smith John P -> 0,61 John Paul Smith :: J Paul Smith -> 0,72 John Paul Smith :: Smith J Paul -> 0,61 John Paul Smith :: John Smith -> 0,62 John Paul Smith :: Smith John -> 0,55 John Paul Smith :: J P Smith -> 0,45 John Paul Smith :: Paul Smith -> 0,57 John Paul Smith :: Smith Paul -> 0,61 John Paul Smith :: Paula Smith -> 0,47 John Paul Smith :: Josephine Smith -> 0,21 John Paul Smith :: Johny Smith -> 0,47
这篇关于编码挑战:智能地比较名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!