正则表达式街道地址匹配 [英] regex street address match

查看:1945
本文介绍了正则表达式街道地址匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

虽然我知道匹配街道地址永远不会是完美的,但我希望创建一些大部分时间都能接近的正则表达式。

While I know that matching a street address will never be perfect I'm looking to create a couple of regex statements that will get close most of the time.

我正在尝试突出显示一个地址。我在正则表达式上很糟糕,我试图接近但是有人可以帮助我理解我如何能做得更好吗?

I'm trying to highlight an address. I sucks at regex and I've tried to get close but could someone help me understand how I can make this better?

string:


早上6点 - 11 pM,Palma Sola小学,6806 Fifth Ave NW,Bradenton,FL 34209快来找到附近的dsfsd sa fsa fasdf asfsds 5001 west你的妈妈没有住在这里我的2005福特游侠,

6 am - 11 pM , Palma Sola Elementary, 6806 Fifth Ave NW, Bradenton, FL 34209 Come find just near the dsfsd sa fsa fasdf asfsds 5001 west your momma doesn't live here my 2005 ford ranger,

正则表达式1:


/ \s +(\d {2,5} \s +)(?![a | p] m \ b)(([a-zA-Z | \ s +] {1 ?,5}){1,2})([\s | \,|。] +)(([A-ZA-Z | \s +] {1,30}){1,4} )(法院| CT |街道| ST |驱动器|博士|车道| LN |公路| RD | BLVD)([\s | \ | | \。] +)(([A-ZA- ž| \s +] {1,30}){1,2})([\s | \,|。] +)\b(AK?| AL | AR | AZ | CA | CO | CT | DC | DE | FL | GA | GU | HI | IA | ID | IL | IN | KS | KY | LA | MA | MD | ME | MI | MN | MO | MS | MT | NC | ND | NE | NH | NJ | NM | NV | NY | OH |行|或| PA | RI | SC | SD | TN | TX | UT | VA | VI | VT | WA | WI | WV | WY)([\s | \\ \\,|。] +)?(\ + + \d {5})?([\ | | \,|。] +)/ i

/\s+(\d{2,5}\s+)(?![a|p]m\b)(([a-zA-Z|\s+]{1,5}){1,2})?([\s|\,|.]+)?(([a-zA-Z|\s+]{1,30}){1,4})(court|ct|street|st|drive|dr|lane|ln|road|rd|blvd)([\s|\,|.|\;]+)?(([a-zA-Z|\s+]{1,30}){1,2})([\s|\,|.]+)?\b(AK|AL|AR|AZ|CA|CO|CT|DC|DE|FL|GA|GU|HI|IA|ID|IL|IN|KS|KY|LA|MA|MD|ME|MI|MN|MO|MS|MT|NC|ND|NE|NH|NJ|NM|NV|NY|OH|OK|OR|PA|RI|SC|SD|TN|TX|UT|VA|VI|VT|WA|WI|WV|WY)([\s|\,|.]+)?(\s+\d{5})?([\s|\,|.]+)/i

(Somet我只有街道和城市,但没有州或邮编)

(Sometimes there's just a street and city, but no state or zip)

正则表达式2:


/ \b(\d {2,5} \s +)(?![a | p] m \ b)(NW | NE | SW | SE |北|南|西|东| N | E | S | W)?([\s | \,|。] +)(([A-ZA-Z | \s +] {1,30}){1,4}) (court | ct | street | st | drive | dr | lane | ln | road | rd | blvd)/ i

/\b(\d{2,5}\s+)(?![a|p]m\b)(NW|NE|SW|SE|north|south|west|east|n|e|s|w)?([\s|\,|.]+)?(([a-zA-Z|\s+]{1,30}){1,4})(court|ct|street|st|drive|dr|lane|ln|road|rd|blvd)/i

小提琴它: http://jsfiddle.net/isuelt/rMC6P/11/

推荐答案

美国地址不是常规语言,使用正则表达式无法匹配。它们在某些孤立的情况下很有帮助,但总的来说,它们会让你失望,特别是对于那样的输入。

US addresses are not a regular language, and cannot be matched by using regular expressions. They are helpful in some isolated cases, but in general, they will fail you, especially for input like that.

我曾经在一家地址验证公司工作。在回答您的问题时,为了在一串文本中突出显示地址,我建议您尝试使用提取实用程序。有几个,我建议你环顾四周,但这里是 我们的 使用您问题的输入 - 如您所见,它找到了地址并对其进行了验证:

I used to work at an address verification company. In answer to your question, to "highlight an address" in a string of text, I recommend you try an extraction utility. There are a few out there and I suggest you look around, but here is ours using the input from your question --- as you can see, it found the address and validated it:

API端点返回JSON,其中包含每个的开始和结束位置地址,以及关于每一个的大量信息。 (请参见上图底部的CSV输出。)

The API endpoint returns JSON which contains the start and end positions of each address, as well as plenty of information about each one. (See the CSV output at the bottom of the picture above.)

我赞扬你冒了那些你试过的正则表达式!希望这很有用。

I commend you for braving those regular expressions you tried! Hopefully this is helpful.

这篇关于正则表达式街道地址匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆