模糊街道地址搜索使用MySQL全文(或sphinx?) [英] Fuzzy Street Address Searches Using MySQL Fulltext (or sphinx?)

查看:199
本文介绍了模糊街道地址搜索使用MySQL全文(或sphinx?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个充满Google地图地址解析响应地址的数据库表。谷歌缩写所有方向(西 - > W,东 - > E等)。

因此,如果我输入地址如100 West Pender Street,则格式化地址由Google地图返回的是100 W Pender St,我将其插入到我的表中。



现在,如果用户出现并搜索该地址,匹配:

pender street
west pender street
100 pender
100 w pender
100 pender



,他们或多或少地这样做。表中的w会被忽略,因为它低于最小字长。在搜索结果中给予东潘纳的地址相同的权重(E也被忽略)。

处理这个问题的最好方法是什么?



我怀疑将最小单词长度设置为1是坏事。

我可以搜索并替换谷歌地址中的已知缩写(N,E,S,W,St,Ave,Dr等),并将其替换为它们的扩展名 - 但有些街道名称在这些地方无效(有些城市有单字母街道名称:J Street等)。

像123 160 St这样的地址根本无法搜索,因为街道号码(123)和街道名称(160 )都低于最小字长。



MySQL的FullText是正确的方法吗?
狮身人面像提供更好的东西吗?



或者还有另一个我还没有考虑过的解决方案吗?请记住,用户的搜索查询不仅会与该媒体资源的地址相匹配,还会与其他文本列(例如媒体资源名称和说明)进行匹配。 解决方案

这实际上是一个难以置信的难题 - 如果你是独立的。我在一家名为 SmartyStreets 的公司工作在地址验证行业,我们的产品执行您描述的任务。这是一个复杂的操作序列,它将地址搜索与有效的,甚至可交付的端点进行匹配。准确,准确,完整地进行地址查询的认证被称为CASS认证。

Google的结果与CASS认证结果之间的差异在于Google的算法是最好的-猜测。这就是Google擅长的......不幸的是,这也适用于不完全有效的地址。 (请参阅: http://answers.smartystreets.com/questions/269/why-did-the-address-fail-validation-it-looks-good-to-me



使用MySQL进行模糊查找会产生结果,并且您的代码可以提供算法来帮助,但不能保证准确性或有效性,或者在这种情况下,即使有任何价值。

我不认为你会希望你的用户得到错误的地址来回报他们的查询。它使你的服务看起来低于标准,用户不会得到他们期望的价值(对吧?)...我建议你找到一个CASS软件供应商。例如,您可以Google验证地址 - 我可以推荐的最佳网络解决方案是SmartyStreets' LiveAddress API


I have a database table full of addresses from Google Maps geocode responses. Google abbreviates all directions (West -> W, East -> E, etc).

So if I enter an address like "100 West Pender Street" then the formatted address returned by Google Maps is "100 W Pender St" which I insert into my table.

Now if a user comes along and searches for that address, all of the following should match:

pender street west pender street 100 pender 100 w pender 100 west pender

and they more or less do. the "w" in the table is ignored however because it falls below the minimum word length. addresses falling on east penner are given equal weighting in the search results (the "E" is also ignored).

What's the best way to handle this?

I suspect setting the minimum word length to 1 is a "bad thing".

I could do a search and replace against the known abbreviations (N, E, S, W, St, Ave, Dr, etc) in the google addresses and replace them with their expansions -- but there are some street names where this is not valid (some cities have single letter street names: J Street, etc...)

Also addresses like "123 160 St" are not searchable at all because the street number (123) and street name (160) both fall below the minimum word length.

Is MySQL FullText the right approach for this? Does Sphinx offer something better?

Or is there another solution I haven't considered yet? Keep in mind that the user's search query will be matched not only against the property's address but also against other text columns such as the property name and description.

解决方案

This is actually an incredibly difficult problem -- if you're on your own. I work in the address verification industry at a company called SmartyStreets, where our products perform the task you describe. It's a complicated sequence of operations that match address searches to valid, even deliverable, endpoints. The accreditation of performing address lookups accurately, correctly, and completely, is called CASS Certification.

The difference between Google's results and CASS-Certified results is that Google's algorithms are "best-guess". This is what Google is good at... unfortunately, that goes for addresses that aren't perfectly valid, too. (See: http://answers.smartystreets.com/questions/269/why-did-the-address-fail-validation-it-looks-good-to-me)

Fuzzy lookups with MySQL will yield results, and your code can have algorithms to help, but there's no guarantee of accuracy or validity, or in that case, even any worth.

I don't think you'll want your users to get wrong addresses in return to their query. It makes your service appear sub-par and the users won't get the value they expect (right?) ... I suggest you find a vendor of CASS software. You can Google "address verification" for example -- the best, web-based solution I can recommend is SmartyStreets' LiveAddress API.

这篇关于模糊街道地址搜索使用MySQL全文(或sphinx?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆