解析邮寄地址的正则表达式 [英] Regular expression for parsing mailing addresses

查看:630
本文介绍了解析邮寄地址的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用正则表达式从地址的第一行解析门牌号,街道名称和街道类型的地址类。这段代码一般运作良好,但我在这里张贴与社会各界分享,看看是否有人有改进的建议。



注:STREETTYPES和象限常数,包含所有相关街道类型和象限分别



我已经在这里包含的一个子集:

 私人常量字符串STREETTYPES = @ALLEY | ALY |附件| AX | ARCADE | ARC | AVENUE | AV |大道| BAYOU | BYU |海滩| ...; 

私人常量字符串象限=N |北| S | SOUTH | E | EAST | W |西部| NE |作者| NW |作者| SE |东南| SW |西南;



HouseNumber,象限,StreetName和StreetType都在类的所有属性。

 私人无效解析(字符串一号线)
{
HouseNumber =的String.Empty;
象限=的String.Empty;
StreetName =的String.Empty;
StreetType =的String.Empty;

如果(String.IsNullOrEmpty(一号线)!)
{
串noPeriodsLine1 = String.Copy(一号线);
noPeriodsLine1 = noPeriodsLine1.Replace(,。);

串addressParseRegEx =
@(九?)
^
\s *
(?:?
(小于housenumber> ?; \d +)
(:( ?: \s + | - )(小于象限>+
象限+
@))
(?? :( ?: \s + | - )(小于streetname> \S +(?:\s + \S +)?*?))?
(:( ?: \s + | - )(小于象限>+
象限+ @))
(:( ?: \s + | - )(小于????streettype>中+ STREETTYPES +
@?))
(?:( ?: \s + | - )(小于streettypequalifier>((?:?!+
象限+
@))( ???\d + | \S +)))
(:( ?: \s + | - )(小于streettypequadrant>(+
象限+ @)))? ?
(:( ?: \s + | - ?)?(小于?后缀GT;(?: STE |套房| po\sbox | APT)\s * \S *))
|
\s + box\s +(?::( PO | |邮局post\s +办公室吗?)(小于postofficebox> \S +?))

\s *
$
;
匹配匹配= Regex.Match(noPeriodsLine1,addressParseRegEx);
如果(match.Success)
{
HouseNumber = match.Groups [housenumber]值。
象限=(string.IsNullOrEmpty(match.Groups [象限]。值))? 。match.Groups [streettypequadrant值:match.Groups [象限]值。
如果(match.Groups [streetname] Captures.Count方式> 1)
{
的foreach(在match.Groups捕捉捕捉[streetname]捕获)
{
StreetName + = capture.Value +;
}
StreetName = StreetName.Trim();
}
,否则
{
StreetName =(string.IsNullOrEmpty(match.Groups [streetname]。值))? 。match.Groups [streettypequalifier值:match.Groups [streetname]值;
}
StreetType = match.Groups [streettype]值。

//如果匹配的街道类型发现
//使用缩写形式......特别是对信贷局呼吁
串streetTypeAbbreviation;
如果(StreetTypes.TryGetValue(StreetType.ToUpper(),出streetTypeAbbreviation))
{
StreetType = streetTypeAbbreviation;
}
}
}

}


解决方案

我不知道你在哪个国家,但如果你在美国,想花一些钱在地址验证,可以购买相关产品USPS < A HREF =htt​​p://www.usps.com/ncsc/addressinfo/addressinfomenu.htm?from=zclsearch&page=ais&WT.z%5Fzip4link=AIS相对=nofollow>此处。和这里是找到免费的单词列表从USPS的好地方预期词语和缩写。我敢肯定,类似的网页可用于其他国家。


I have an address class that uses a regular expression to parse the house number, street name, and street type from the first line of an address. This code is generally working well, but I'm posting here to share with the community and to see if anyone has suggestions for improvement.

Note: The STREETTYPES and QUADRANT constants contain all of the relevant street types and quadrants respectively.

I've included a subset here:

private const string STREETTYPES = @"ALLEY|ALY|ANNEX|AX|ARCADE|ARC|AVENUE|AV|AVE|BAYOU|BYU|BEACH|...";

private const string QUADRANTS = "N|NORTH|S|SOUTH|E|EAST|W|WEST|NE|NORTHEAST|NW|NORTHWEST|SE|SOUTHEAST|SW|SOUTHWEST";

HouseNumber, Quadrant, StreetName, and StreetType are all properties on the class.

    private void Parse(string line1)
	{
        HouseNumber = string.Empty;
        Quadrant = string.Empty;
        StreetName = string.Empty;
        StreetType = string.Empty;

        if (!String.IsNullOrEmpty(line1))
        {
            string noPeriodsLine1 = String.Copy(line1);
            noPeriodsLine1 = noPeriodsLine1.Replace(".", "");

            string addressParseRegEx =
                @"(?ix)
            ^
            \s*
            (?:
               (?<housenumber>\d+)
               (?:(?:\s+|-)(?<quadrant>" +
                QUADRANTS +
                @"))?
               (?:(?:\s+|-)(?<streetname>\S+(?:\s+\S+)*?))??
               (?:(?:\s+|-)(?<quadrant>" +
                QUADRANTS + @"))?
               (?:(?:\s+|-)(?<streettype>" + STREETTYPES +
                @"))?
               (?:(?:\s+|-)(?<streettypequalifier>(?!(?:" +
                QUADRANTS +
                @"))(?:\d+|\S+)))?
               (?:(?:\s+|-)(?<streettypequadrant>(" +
                QUADRANTS + @")))??
               (?:(?:\s+|-)(?<suffix>(?:ste|suite|po\sbox|apt)\s*\S*))?
            |
               (?:(?:po|postoffice|post\s+office)\s+box\s+(?<postofficebox>\S+))
            )
            \s*
            $
            ";
            Match match = Regex.Match(noPeriodsLine1, addressParseRegEx);
            if (match.Success)
            {
                HouseNumber = match.Groups["housenumber"].Value;
                Quadrant = (string.IsNullOrEmpty(match.Groups["quadrant"].Value)) ? match.Groups["streettypequadrant"].Value : match.Groups["quadrant"].Value;
                if (match.Groups["streetname"].Captures.Count > 1)
                {
                    foreach (Capture capture in match.Groups["streetname"].Captures)
                    {
                        StreetName += capture.Value + " ";
                    }
                    StreetName = StreetName.Trim();
                }
                else
                {
                    StreetName = (string.IsNullOrEmpty(match.Groups["streetname"].Value)) ? match.Groups["streettypequalifier"].Value : match.Groups["streetname"].Value;
                }
                StreetType = match.Groups["streettype"].Value;

                //if the matched street type is found
                //use the abbreviated version...especially for credit bureau calls
                string streetTypeAbbreviation;
                if (StreetTypes.TryGetValue(StreetType.ToUpper(), out streetTypeAbbreviation))
                {
                    StreetType = streetTypeAbbreviation;
                }
            }
        }

	}

解决方案

I don't know what country you're in, but if you're in the USA and want to spend some money on address validation, you can buy related USPS products here. And here is a good place to find free word lists from the USPS for expected words and abbreviations. I'm sure similar pages are available for other countries.

这篇关于解析邮寄地址的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆