preg_match以提取锚点上的mailto [英] preg_match to extract mailto on anchor

查看:63
本文介绍了preg_match以提取锚点上的mailto的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从带有regex的mailto属性的锚中获取电子邮件地址.

I need to get the email adress from an anchor with a mailto attribute with regex.

此模式:(.*)<a\s(.*?)(.*)\s*href\=['"]mailto:([-a-z0-9_]+)@([a-z0-9-]+).([a-z]+)['"]>(.*)</a>(.*)

尽管不能与PHP一起使用,但可以在regex教练中使用.

Works in regex coach though it doesnt work with PHP.

代码:

preg_match("'(.*)<a (.*?)(.*) *href\=['\"]mailto:([-a-z0-9_]+)@([a-z0-9-]+).([a-z]+)['\"]>(.*)</a>(.*)'si", "<a href=\"mailto:someemail@ohio.com\"">Some email</a>", $matches);

print_r($matches);

那为什么要在php中起作用呢?

So why doenst it work in php?

推荐答案

PHP的PCRE 要求正则表达式为包裹在定界符中,该分隔符将模式与可选的

PHP’s PCRE require the regular expression to be wrapped into delimiters that separate the pattern from optional modifiers. In this case the first non-alphanumeric character is used (i.e. ') so the pattern is actually just (.*)<a (.*?)(.*) *href\=[ and the rest are treated as modifiers. And that is an invalid regular expression as the [ is not properly escaped and the rest are not valid modifiers neither.

正如其他人已经建议的那样,您可以通过在正则表达式中转义分隔符'的任何出现来解决此问题,或者选择一个不在正则表达式中出现的分隔符.

As the others have already suggested, you can fix this by escaping any occurrence of the delimiter ' inside the regular expression or choose a different delimiter that does not appear in the regular expression.

但是,除此之外,尝试使用正则表达式解析HTML非常容易出错.在这种情况下,使用很多.*也会导致可怕的性能表现(这仅仅是由于正则表达式的处理方式所致).

But besides that, trying to parse HTML with regular expressions is very error prone. In you case using that many .* will also result in a horrible performance behavior (it’s just due to how regular expressions are processed).

最好使用正确的HTML解析器,该解析器返回可以像 PHP的DOM库那样查询的DOM :

Better use a proper HTML parser that returns a DOM that can be queried like PHP’s DOM library:

$doc = new DomDocument();
$doc->loadHTML($str);
foreach ($doc->getElementsByTagName("a") as $a) {
    if ($a->hasAttribute("href")) {
        $href = trim($a->getAttribute("href"));
        if (strtolower(substr($href, 0, 7)) === 'mailto:') {
            $components = parse_url($href);
        }
    }
}

这篇关于preg_match以提取锚点上的mailto的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆