find-and-replace-in-html正则表达式失败 [英] find-and-replace-in-html regular expression fails

查看:174
本文介绍了find-and-replace-in-html正则表达式失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个正则表达式,通过html内容查找一些过去常用的关键字,但现在失败了,我不明白为什么。 (正则表达式来自这个线程。)

I have a regular expression that looks through html content for some keywords that used to work, but now fails and i don't understand why. (The regular expression came from this thread.)

$find = '/(?![^<]+>)(?<!\w)(' . preg_quote($t['label']) . ')\b/s';
$text = preg_replace_callback($find, 'replaceCallback', $text);

function replaceCallback($match) {
        if (is_array($match)) {
            $htmlVersion = $match[1];
            $urlVersion = urlencode($htmlVersion);
            return '<a class="tag" rel="tag-definition" title="Click to know more about ' . $htmlVersion . '" href="?tag=' . $urlVersion . '">' . $htmlVersion . '</a>';
        }
        return $match;
    }

错误消息指向preg_replace_Callback调用并说:

The error message points to the preg_replace_Callback call and says:

Warning: preg_replace_callback() [function.preg-replace-callback]: Unknown modifier 't' in /frontend.functions.php  on line 43


推荐答案

请注意不是尝试为正则表达式提供修复。它只是在这里展示它是多么困难(我敢说,不可能)创建一个能够成功解析HTML的正则表达式。即使结构良好的XHTML也会非常困难,但是结构不良的HTML对于正则表达式来说是不可行的。

Please note: this is not an attempt to provide a fix for the regex. It is just here to show how difficult it is (dare I say impossible) to create a regex that will successfully parse HTML. Even well structured XHTML would be nightmarishly difficult, but poorly structured HTML is a no-go for regular expressions.

我同意100%使用正则表达式来尝试HTML解析一个非常糟糕的主意。以下代码使用提供的函数来分析一些简单的HTML标记。当它发现嵌套的HTML标记< em> Test< em>

I agree 100% that using regular expressions to attempt HTML parsing is a very bad idea. The following code uses the supplied function to parse some simple HTML tags. It trips up on its second attempt when it finds the nested HTML tag <em>Test<em>:

$t['label'] = 'Test';
$text = '<p>Test</p>';

$find = '/(?![^<]+>)(?<!\w)(' . preg_quote($t['label']) . ')\b/s';
$text = preg_replace_callback($find, 'replaceCallback', $text);

echo "Find:   $find\n";
echo 'Quote:  ' . preg_quote($t['label']) . "\n";
echo "Result: $text\n";

/* Returns:

Find:   /(?![^<]+>)(?<!\w)(Test)\b/s
Quote:  Test
Result: <p><a class="tag" rel="tag-definition" title="Click to know more about Test" href="?tag=Test">Test</a></p>

*/

$t['label'] = '<em>Test</em>';
$text = '<p>Test</p>';

$find = '/(?![^<]+>)(?<!\w)(' . preg_quote($t['label']) . ')\b/s';
$text = preg_replace_callback($find, 'replaceCallback', $text);

echo "Find:   $find\n";
echo 'Quote:  ' . preg_quote($t['label']) . "\n";
echo "Result: $text\n";

/* Returns:

Find:   /(?![^<]+>)(?<!\w)(Test)\b/s
Quote:  Test
Result: <p><a class="tag" rel="tag-definition" title="Click to know more about Test" href="?tag=Test">Test</a></p>
Warning: preg_replace_callback() [function.preg-replace-callback]: Unknown modifier '\' in /test.php  on line 25
Find:   /(?![^<]+>)(?<!\w)(\<em\>Test\</em\>)\b/s
Quote:  \<em\>Test\</em\>

Result: 

*/

function replaceCallback($match) {
    if (is_array($match)) {
        $htmlVersion = $match[1];
        $urlVersion = urlencode($htmlVersion);
        return '<a class="tag" rel="tag-definition" title="Click to know more about ' . $htmlVersion . '" href="?tag=' . $urlVersion . '">' . $htmlVersion . '</a>';
    }
    return $match;
}

这篇关于find-and-replace-in-html正则表达式失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆