用于匹配名称/地址数据的工具 [英] Tools for matching name/address data

查看:135
本文介绍了用于匹配名称/地址数据的工具的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个有趣的问题。

我有一个oracle数据库,名称为&地址信息需要保持当前状态。

I have an oracle database with name & address information which needs to be kept current.

我们从多个不同的政府来源获取数据源,需要找出匹配,以及是否使用数据更新数据库,或者需要创建新记录。

We get data feeds from a number of different gov't sources, and need to figure out matches, and whether or not to update the db with the data, or if a new record needs to be created.

没有任何类型的唯一标识符可用于将记录绑定在一起,数据质量并不总是那么好 - 总会有打字错误,使用不同名称的人(即乔和约瑟夫)等。

There isn't any sort of unique identifier that can be used to tie records together, and the data quality isn't always that good - there will always be typos, people using different names (i.e. Joe vs. Joseph), etc.

我会有兴趣从任何在这种类型的问题上工作的人听到他们如何解决这个问题,或者至少是自动化的部分。

I'd be interested in hearing from anyone who's worked on this type of problem before as to how they solved it, or at least automated parts of it.

推荐答案

在这一领域活跃的每个主要软件公司都提供解决方案套件,处理名称和地址解析,数据标准化,记录重复数据删除或匹配,记录链接/合并,生存等。他们都有点贵了,但是。

Each of the major software companies that are active in this space offer solution suites that handle name and address parsing, data standardization, record deduplication or matching, record linking/merging, survivorship, and so on. They're all a bit pricey, though.

例如,Oracle自己的解决这个问题的解决方案是产品 Oracle Data Integrator(ODI)的Oracle数据质量(ODQ),这是其融合中间件堆栈的一部分。顾名思义,ODQ需要ODI(即它是一个独立授权并依赖于ODI的附加模块)。

For example, Oracle's own solution for this problem is the product "Oracle Data Quality (ODQ) for Oracle Data Integrator (ODI)," which is part of their Fusion Middleware stack. As the name implies, ODQ requires ODI (i.e., it is an add-on module that is licensed separately and is dependent on ODI).

IBM的Websphere解决方案套件通过其Ascential收购)包括 QualityStage

IBM's Websphere solution suite (obtained through their Ascential acquisition) includes QualityStage.

现在SAP公司的Business Objects拥有数据质量产品在其企业信息管理(EIM)套件下。

Business Objects, now an SAP company, has a Data Quality product under its Enterprise Information Management (EIM) suite.

其他主要数据质量品牌包括 Dataflux (SAS公司)和 Trillium Software (Harte-Hanks公司)

Other major data quality brands include Dataflux (a SAS company) and Trillium Software (a Harte-Hanks company)

Gartner集团为数据质量解决方案套件发布了一年一度的魔力象限。在这些魔力象限中评分良好的供应商通常可以在整个报告中在网站上注册用户(示例1 示例2 )。

The Gartner Group releases an annual Magic Quadrant for data quality solution suites. Vendors who rate well in these Magic Quadrants usually make the entire report available online to registered users on their website (example 1, example 2).

这篇关于用于匹配名称/地址数据的工具的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆