MySQL街道地址模糊搜索 [英] MySQL street address fuzzy search
问题描述
有谁知道在MySQL数据库上进行模糊街道地址搜索的好脚本(或好策略)?关键问题是:
$ b $ ul
我希望能够匹配:
123 Main st,unit B = 123 Main Street Unit b
尽管它并不完美并且速度很慢,但您仍然希望通过REGEXP()使用正则表达式。
下面是第一遍正则表达式,以匹配大多数情况(以及您的示例):
(?isx)#search across多行并忽略大小写
(#完全匹配
(#st数字 - 数字词怎么样像一个或两个?
\ d +
)
\ s + #whitespace
(#街道名称(一个或多个单词)
[az] +
(?:
\s +
[az] +
)*
)
\ s +#whitespace
(#街道类型
al(?:y \。?| ley)#aly,aly。或胡同
|
ave(?:\。| nue)? #ave,ave。或avenue
|
b(?lvd \。?| oulevard)#blvd,blvd。或林荫大道
|
c(?:t \。?| ourt)#ct,ct。或法院
|
cir(?:\c\。?| cle)? #cir,circ,circ。或圈出
|
cres(?:\。| cent)? #cres,cres。或新月
|
dr(?:\。| ive)? #博士,博士。或驱动
|
exp(?:y \。?| ressway)#expy,expy。或高速公路
|
f(?:wy\。?| reeway)#fwy,fwy。或高速公路
|
g(?:rdns\。?| ardens)#grdns,grdns。或花园
|
h(?:wy\。?| ighway)#hwy,hwy。或高速公路
|
l(?n \。?| ane)#ln,ln。或土地
|
m(?:nr \。?| anor)#mnr,mnr。或庄园
|
m(?:trwy \。?| otorway)#mtrwy,wtrwy。或高速公路
|
pl(?:\。| ace)? #pl,pl。或放置
|
r(?:d \。?| oad)#rd,rd。或道路
|
st(?:\。| reet)? #st,st。或街道
|
t(?:pk\。?| urnpike)#tpk,tpk。或收费
|
ter(?:\r?\。?| race)#ter,ter。,terr,terr。或露台
|
tr(?:l.\?| ail)#trl,trl。或者追踪
|
派克|公园|步行|环形|海湾|关闭|大门|高地
|
row | way | oval | dell | rise | vale | byway | lawn
)
\ ,? #可选逗号
\s +#空格
(#可选号码,单位,apt或floor
(
\##number
|
unit #unit
|
num(?:\。| ber)#num,num。or number
|
ap(?:t\。?| artment)# apt,apt。or apartment
|
fl(?:\。| oor)?#fl,fl。or floor
)
\s +
\\ \\ d +
)?
)
哪些会返回:
$ 1 - 完全匹配
<2> $ 2 - 房屋编号
$ 3 - 街道名称
$ 4 - 街道类型
$ 5 - 单位或合住号码(可选)
要在mysql中使用它,您需要删除所有注释(从'#'到eol),删除第一行(切换选项),并且将所有内容全部折叠为一行而不用任何空格。
Does anyone know of a good script (or good strategy) for doing a fuzzy street address search on a MySQL database? The key issues are:
- capitalization (easy -- just use LCASE)
- punctuation (could use REPLACE; not sure it there are more efficient options)
- abbreviations (this is the tough one -- street = st, etc.)
I want to be able to match: 123 Main st, unit B = 123 Main Street Unit b
While it's not perfect and can be quite slow, you'll want to use a regular expression via REGEXP().
Here's a first-pass regular expression to match most cases (as well as your example):
(?isx) # search across multiple lines and ignore case
( # full match
( # st number - what about number words like one or two?
\d+
)
\s+ # whitespace
( # street name (one or more words)
[a-z]+
(?:
\s+
[a-z]+
)*
)
\s+ # whitespace
( # street type
al(?:y\.?|ley) # aly, aly. or alley
|
ave(?:\.|nue)? # ave, ave., or avenue
|
b(?lvd\.?|oulevard) # blvd, blvd. or boulevard
|
c(?:t\.?|ourt) # ct, ct. or court
|
cir(?:\c\.?|cle)? # cir, circ, circ. or circle
|
cres(?:\.|cent)? # cres, cres. or crescent
|
dr(?:\.|ive)? # dr, dr. or drive
|
exp(?:y\.?|ressway) # expy, expy. or expressway
|
f(?:wy\.?|reeway) # fwy, fwy. or freeway
|
g(?:rdns\.?|ardens) # grdns, grdns. or gardens
|
h(?:wy\.?|ighway) # hwy, hwy. or highway
|
l(?n\.?|ane) # ln, ln. or land
|
m(?:nr\.?|anor) # mnr, mnr. or manor
|
m(?:trwy\.?|otorway) # mtrwy, wtrwy. or motorway
|
pl(?:\.|ace)? # pl, pl. or place
|
r(?:d\.?|oad) # rd, rd. or road
|
st(?:\.|reet)? # st, st. or street
|
t(?:pk\.?|urnpike) # tpk, tpk. or turnpike
|
ter(?:\r?\.?|race) # ter, ter., terr, terr. or terrace
|
tr(?:l.\?|ail) # trl, trl. or trail
|
pike|park|walk|loop|bay|close|gate|highlands
|
row|way|oval|dell|rise|vale|byway|lawn
)
\,? # optional comma
\s+ # whitespace
( # optional number, unit, apt or floor
(
\# # number
|
unit # unit
|
num(?:\.|ber) # num, num. or number
|
ap(?:t\.?|artment) # apt, apt. or apartment
|
fl(?:\.|oor)? # fl, fl. or floor
)
\s+
\d+
)?
)
Which will return:
$1 - full match
$2 - house number
$3 - street name
$4 - street type
$5 - unit or apt number (optional)
To use this in mysql, you'll need to strip out all the comments (from '#' to eol), remove the first line (switch options), and collapse everything to a single line without any whitespace.
这篇关于MySQL街道地址模糊搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!