在Java中匹配不精确的公司名称 [英] Matching inexact company names in Java

查看:507
本文介绍了在Java中匹配不精确的公司名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个公司数据库。我的应用程序接收按名称引用公司的数据,但名称可能与数据库中的值不完全匹配。我需要将传入的数据与它所引用的公司进行匹配。

I have a database of companies. My application receives data that references a company by name, but the name may not exactly match the value in the database. I need to match the incoming data to the company it refers to.

例如,我的数据库可能包含一个名为A. B. Widgets& Co Ltd.的公司。我的传入数据可能会引用AB Widgets Limited,AB Widgets and Co或AB Widgets。

For instance, my database might contain a company with name "A. B. Widgets & Co Ltd." while my incoming data might reference "AB Widgets Limited", "A.B. Widgets and Co", or "A B Widgets".

公司名称中的一些单词(AB Widgets)匹配比其他(Co,Ltd,Inc等)更重要。避免错误匹配很重要。

Some words in the company name (A B Widgets) are more important for matching than others (Co, Ltd, Inc, etc). It's important to avoid false matches.

公司数量足够小,我可以在内存中维护他们的名字地图,即。我可以选择使用Java而不是SQL来查找正确的名称。

The number of companies is small enough that I can maintain a map of their names in memory, ie. I have the option of using Java rather than SQL to find the right name.

您将如何在Java中执行此操作?

How would you do this in Java?

推荐答案

虽然这个帖子有点旧,但我最近对名称匹配的字符串距离指标的效率进行了调查,并且遇到了这个库:

Although this thread is a bit old, I recently did an investigation on the efficiency of string distance metrics for name matching and came across this library:

https://code.google.com/p/java-相似之处/

如果你不想花费多少时间来实现字符串距离算法,我建议尝试第一步,有一个已经实现了~20种不同的算法(包括Levenshtein,Jaro-Winkler,Monge-Elkan算法等),它的代码结构很好,你不必深入理解整个逻辑,但你可以开始使用它在几分钟内。

If you don't want to spend ages on implementing string distance algorithms, I recommend to give it a try as the first step, there's a ~20 different algorithms already implemented (incl. Levenshtein, Jaro-Winkler, Monge-Elkan algorithms etc.) and its code is structured well enough that you don't have to understand the whole logic in-depth, but you can start using it in minutes.

(顺便说一句,我不是图书馆的作者,所以对其创作者来说是赞誉。)

(BTW, I'm not the author of the library, so kudos for its creators.)

这篇关于在Java中匹配不精确的公司名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆