我如何确定两个相似的乐队的名字代表相同的乐队? [英] How do I determine if two similar band names represent the same band?

查看:129
本文介绍了我如何确定两个相似的乐队的名字代表相同的乐队?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前工作的,需要我配合我们的乐队和场地有一些外部服务的数据库的项目。

I'm currently working on a project that requires me to match our database of Bands and venues with a number of external services.

基本上,我在寻找用于确定是否两个名字是相同的最佳方法的一些方向。例如:

Basically I'm looking for some direction on the best method for determining if two names are the same. For Example:


  • 我们的数据库会场的名字 - 猪八戒和口哨

  • 服务1 - 猪和口哨

  • 服务2 - 猪八戒和放大器;黑哨

  • 等等等等

  • Our database venue name - "The Pig and Whistle"
  • service 1 - "Pig and Whistle"
  • service 2 - "The Pig & Whistle"
  • etc etc

我觉得主要区别将是一样的东西失踪了,或使用与&而不是和,但也可能是一些类似略有不同的拼写和词语不同的订单。

I think the main differences are going to be things like missing "the" or using "&" instead of "and" but there could also be things like slightly different spelling and words in different orders.

什么算法/技术通常在这种情况下使用,我需要滤除噪声的话或做某种拼写检查匹配类型?

What algorithms/techniques are commonly used in this situation, do I need to filter noise words or do some sort of spell check type match?

您看到的在C#中的东西simlar的例子?

Have you seen any examples of something simlar in c#?

更新:如果有人有兴趣交流#的例子有,你可以做一个的谷歌代码搜索Levenshtein距离

UPDATE: In case anyone is interested in a c# example there is a heap you can access by doing a google code search for Levenshtein distance

推荐答案

的规范(也许最简单的)方式做,这是衡量 Levenshtein距离 两个字符串之间。如果该距离是相对于串的大小小,它可能是相同的字符串。需要注意的是,如果你有比较大量的非常小的字符串,它会更难判断他们是否是相同的或没有。它与长串好。

The canonical (and probably the easiest) way to do this is to measure the Levenshtein distance between the two strings. If the distance is small relative to the size of the string, it's probably the same string. Note that if you have to compare a lot of very small strings it'll be harder to tell whether they're the same or not. It works better with longer strings.

一个更聪明的办法可能是比较Levenshtein距离两个字符串之间,而是零距离分配到比较明显的转变,如和/&放大器;,探听小狗Dogg/探听等

A smarter approach might be to compare the Levenshtein distance between the two strings but to assign a distance of zero to the more obvious transformations, like "and"/"&", "Snoop Doggy Dogg"/"Snoop", etc.

这篇关于我如何确定两个相似的乐队的名字代表相同的乐队?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆