从字符串中提取地址 [英] Extract address from string

查看:558
本文介绍了从字符串中提取地址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有这个字符串:

<div>john doe is nice guy btw 8240 E. Marblehead Way 92808  is also</div>

或此字符串:

<div>sky being blue? in the world is true? 024 Brea Mall  Brea, California 92821 jackfroast nipping on the firehead</div>

我如何从其中一个字符串中提取地址?这会涉及到某种Regex,对吗?

How would I go about extracting the address from one of these strings? This would involve some sort of Regex, right?

我尝试在线寻找使用JavaScript或PHP的解决方案,但无济于事。
而且Stack Overflow上没有其他帖子(据我所知)提供了一个使用jQuery和/或Javascript和/或PHP的解决方案。 (最接近的是解析可用的街道地址,城市,州,从字符串中压缩,该字符串中没有任何关于从字符串中提取邮政编码的代码。

I've tried looking online for a solution using JavaScript or PHP, but to no avail. And no other post here on Stack Overflow (as far as I know) provides a solution that uses jQuery and/or Javascript and/or PHP. (The closest is Parse usable Street Address, City, State, Zip from a string, which DOESN'T have any code in the thread about extracting a postal code from a string.

有人可以指点我正确的方向?我将如何在jQuery或JavaScript或PHP中实现这一目标?

Can somebody point me in the right direction? How would I go about accomplishing this in jQuery or JavaScript or PHP?

推荐答案

尝试了12个不同的字符串与你的相似,它运作得很好:

Tried this on twelve different strings that were similar to yours and it worked just fine:

function str_to_address($context) { 

    $context_parts = array_reverse(explode(" ", $context)); 
    $zipKey = ""; 
    foreach($context_parts as $key=>$str) { 
        if(strlen($str)===5 && is_numeric($str)) { 
            $zipKey = $key;
            break; 
        }
    }

    $context_parts_cleaned = array_slice($context_parts, $zipKey); 
    $context_parts_normalized = array_reverse($context_parts_cleaned); 
    $houseNumberKey = ""; 
    foreach($context_parts_normalized as $key=>$str) { 
        if(strlen($str)>1 && strlen($str)<6 && is_numeric($str)) { 
            $houseNumberKey = $key;
            break; 
        }
    }

    $address_parts = array_slice($context_parts_normalized, $houseNumberKey);
    $string = implode(' ', $address_parts);
    return $string;
}

这假设门牌号至少为两位数,且不大于6位。这也假定邮政编码不是扩展形式(例如12345-6789)。然而,这可以很容易地修改以适应这种格式(正则表达式在这里是一个很好的选择,类似于(\d {5} -\d {4})

This assumes a house number of at least two digits, and no greater than six. This also assumes that the zip code isn't in the "expanded" form (e.g. 12345-6789). However this can be easily modified to fit that format (regex would be a good option here, something like (\d{5}-\d{4}).

但是使用正则表达式来解析用户输入的数据......这里不是一个好主意,因为我们只是不知道用户将输入什么,因为有(可以假设)没有验证。

But using regex for parsing user-inputted data... Not a good idea here, because we just don't know what a user is going to input because there were (as one can assume) no validations.

遍历代码和逻辑,从上下文创建数组开始并抓住zip:

Walking through the code and logic, starting with creating the array from the context and grabbing the zip:

// split the context (for example, a sentence) into an array, 
// so we can loop through it. 
// we reverse the array, as we're going to grab the zip first. 
// why? we KNOW the zip is 5 characters long*.
$context_parts = array_reverse(explode(" ", $context));  

// we're going to store the array index of the zip code for later use 
$zipKey = ""; 

// foreach iterates over an object given the params, 
// in this case it's like doing... 
// for each value of $context_parts ($str), and each index ($key)
foreach($context_parts as $key=>$str) { 

    // if $str is 5 chars long, and numeric... 
    // an incredibly lazy check for a zip code...
    if(strlen($str)===5 && is_numeric($str)) {  
        $zipKey = $key;

        // we have what we want, so we can leave the loop with break
        break; 
    }
}

做一些整理,以便我们有一个更好的服装对象房屋号码来自

Do some tidying so we have a better object to garb the house number from

// remove junk from $context_array, since we don't 
// need stuff after the zip
$context_parts_cleaned = array_slice($context_parts, $zipKey); 

// since the house number comes first, let's go back to the start
$context_parts_normalized = array_reverse($context_parts_cleaned);

然后让我们使用与邮政编码相同的基本逻辑来获取门牌号码:

And then let's grab the house number, using the same basic logic that we did the zip code:

$houseNumberKey = ""; 
foreach($context_parts_normalized as $key=>$str) { 
    if(strlen($str)>1 && strlen($str)<6 && is_numeric($str)) { 
        $houseNumberKey = $key;
        break; 
    }
}

// we probably have the parts we for the address.
// let's do some more cleaning 
$address_parts = array_slice($context_parts_normalized, $houseNumberKey);

// and build the string again, from the address
$string = implode(' ', $address_parts);

// and return the string
return $string;

这篇关于从字符串中提取地址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆