PHP Reg ex用于解析链接 [英] PHP Reg ex for parsing a link
问题描述
我有一个PHP脚本,它解析表单(消息)的POST内容并转换真实HTML链接中的任何URL。这是我使用的2个正则表达式:
I've a PHP script that parse the POST content of a form (message) and transform any URL in a real HTML link. This is the 2 regular expressions I use:
$dbQueryList['sb_message'] = preg_replace("#(^|[\n ])([\w]+?://[^ \"\n\r\t<]*)#is", "\\1<a href=\"\\2\" target=\"_blank\">\\2</a>", $dbQueryList['sb_message']);
$dbQueryList['sb_message'] = preg_replace("#(^|[\n ])((www|ftp)\.[^ \"\t\n\r<]*)#is", "\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a>", $dbQueryList['sb_message']);
好的它运作良好但是现在,在另一个脚本中我想做相反的事情。所以在我的 $ dbQueryList ['sb_message']
我可以有这样的链接< a href =http://google.com target =_ blank> Google< / a>
我想要 http://google.com
。
Ok it works well but now, in another script I would like to do the opposite. So in my $dbQueryList['sb_message']
I could have a link like this "<a href="http://google.com" target="_blank">Google</a>
" and I would like to just have "http://google.com
".
我无法编写能够做到这一点的正则表达式。请问你能帮帮我吗?
谢谢:)
I cannot write the regex that can do that. Could you help me please? Thanks :)
推荐答案
使用 DOMDocument 而不是正则表达式来解析HTML内容。
It's safer to use DOMDocument instead of regex to parse HTML contents.
试试这段代码:
<?php
function extractAnchors($html)
{
$dom = new DOMDocument();
// loadHtml() needs mb_convert_encoding() to work well with UTF-8 encoding
$dom->loadHtml(mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"));
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//a') as $node)
{
if ($node->hasAttribute('href'))
{
$newNode = $dom->createDocumentFragment();
$newNode->appendXML($node->getAttribute('href'));
$node->parentNode->replaceChild($newNode, $node);
}
}
// get only the body tag with its contents, then trim the body tag itself to get only the original content
return mb_substr($dom->saveXML($xpath->query('//body')->item(0)), 6, -7, "UTF-8");
}
$html = 'Some text <a href="http://www.google.com">Google</a> some text <img src="http://dontextract.it" alt="alt"> some text.';
echo extractAnchors($html);
这篇关于PHP Reg ex用于解析链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!