SQL地址数据乱七八糟,如何在查询中清理? [英] SQL address data is messy, how to clean it up in a query?

查看:34
本文介绍了SQL地址数据乱七八糟,如何在查询中清理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将地址数据存储在 sql server 2000 数据库中,我需要提取给定客户代码的所有地址.问题是,有很多地址拼写错误,有些地址缺失,等等.所以我需要以某种方式清理它.我需要剔除错误的拼写、缺失的部分等,并得出平均"记录.例如,如果 New York 在 5 条记录中有 4 条拼写正确,则应该是返回的值.

我无法修改数据、在输入时对其进行验证或类似的操作.我只能修改数据的副本,或者通过查询对其进行操作.

我在这里得到了部分答案存储在 SQL 服务器中的地址有许多小的变化(错误),但我需要考虑每个代码有多个有效地址.

样本数据

<前>代码名称 地址1 地址2 城市 州 邮政编码 使用时间10003 AMERICAN NUTRITON INC 2183 BALL STREET OLDEN Utah 87401 17710003 美国营养公司 2183 BALL STREET PO BOX 1504 OLDEN Utah 87402 7610003 美国营养公司 2183 BALL STREET OLDEN Utah 87402 2410003 美国营养公司 2183 BALL STREET PO BOX 1504 OLDEN Utah 87402 1710003 Samantha Brooks 506 S. Main Street Ellensburg Washington 98296 110003 BEMIS 公司 1401 W. 第四平原大道.温哥华 华盛顿 98660 110003 CEI 597 VANDYRE BOULEVARD WRIGHTSTOWN 威斯康星州 54180 110003 Pacific Pet 28th Avenue OLDEN 犹他州 84401 110003 PETSMART, INC. 16091 NORTH 25TH STREET PHOENA Arizona 85027 110003 THE PET FIRM 16418 NORTH 37TH STREET PHOENA Arizona 85503 1

期望输出

<前>代码名称 地址1 地址2 城市 州 邮政编码10003 美国营养公司 2183 BALL AVENUE 奥尔登犹他州 8440110003 Samantha Brooks 506 S. Main Street Ellensburg Washington 9829610003 BEMIS 公司 1401 W. 第四平原大道.温哥华 华盛顿 9866010003 CEI 975 VANDYKE 路 WRIGHTSTON 威斯康星州 5418010003 Pacific Pet 29th Street OGDEN 犹他州 8440110003 PETSMART, INC. 16091 NORTH 25TH AVENUE PHOENA Arizona 8502710003 宠物公司 16418 北 37 街凤凰城亚利桑那州 85503

解决方案

最好的解决方案是使用 CASS 认证的地址标准化程序或服务来格式化和验证地址.除了具有此功能的工具的 USPS 之外,还有许多提供此功能的第三方程序或服务.地址解析比您想象的要复杂得多,因此尝试提出几个查询来执行它会充满危险.

Google 的地理编码是另一个值得一看的地方..显然,谷歌要求您显示结果以使用他们的地理编码服务.这就需要使用专用地址解析器,如 USPS 或第三方程序.

I have address data stored in an sql server 2000 database, and I need to pull out all the addresses for a given customer code. The problem is, there are a lot of misspelled addresses, some with missing parts, etc. So I need to clean this up somehow. I need to weed oout the bad spellings, missing parts, etc and come up with the "average" record. For example, if New York is spelled properly in 4 out of 5 records, that should be the value returned.

I can't modify the data, validate it on input, or anything like that. I can only modify a copy of the data, or manipulate it through a query.

I got a partial answer here Addresses stored in SQL server have many small variations(errors), but I need to allow for multiple valid addresses per code.

Sample Data

Code    Name                       Address1                      Address2           City            State          Zip     TimesUsed
10003   AMERICAN NUTRITON INC     2183 BALL STREET                                 OLDEN           Utah           87401     177
10003   AMEICAN NUTRITION INC     2183 BALL STREET              PO BOX 1504        OLDEN           Utah           87402     76
10003   AMERICAN NUTRITION INC    2183 BALL STREET                                 OLDEN           Utah           87402     24
10003   AMERICAN NUTRITION INC    2183 BALL STREET              PO BOX 1504        OLDEN           Utah           87402     17
10003   Samantha Brooks           506 S. Main Street                               Ellensburg      Washington     98296     1
10003   BEMIS COMPANY             1401 W. FOURTH PLAIN BLVD.                       VANCOUVER       Washington     98660     1
10003   CEI                       597 VANDYRE BOULEVARD                            WRIGHTSTOWN     Wisconsin      54180     1
10003   Pacific Pet               28th Avenue                                      OLDEN           Utah           84401     1
10003   PETSMART, INC.            16091 NORTH 25TH STREET                          PHOENA         Arizona        85027      1
10003   THE PET FIRM              16418 NORTH 37TH STREET                          PHOENA         Arizona        85503      1

Desired Output

Code    Name                      Address1                      Address2           City            State          Zip     
10003   AMERICAN NUTRITION INC    2183 BALL AVENUE                                 Olden           Utah           84401
10003   Samantha Brooks             506 S. Main Street                               Ellensburg      Washington     98296 
10003   BEMIS COMPANY             1401 W. FOURTH PLAIN BLVD.                       VANCOUVER       Washington     98660
10003   CEI                       975 VANDYKE ROAD                                 WRIGHTSTOWN     Wisconsin      54180
10003   Pacific Pet               29th Street                                      OGDEN           Utah           84401
10003   PETSMART, INC.            16091 NORTH 25TH AVENUE                          PHOENA         Arizona        85027
10003   THE PET FIRM              16418 NORTH 37TH STREET                          PHOENA         Arizona        85503

解决方案

The best solution is to use a CASS certified address standardization program or service that will format and validate the address. Beyond the USPS which has tools for this, there are many third-party programs or services which provide this functionality. Address parsing is far more complicated than you might imagine and thus trying whip up a few queries to do it will be fraught with peril.

Google's Geocoding is another place to look.. Apparently Google requires you display the results to use their Geocoding service. That leaves using dedicated address parsers like the USPS or a third-party program.

这篇关于SQL地址数据乱七八糟,如何在查询中清理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆