在< body>内匹配多个字词标签 [英] Match multiple terms within <body> tags

查看:74
本文介绍了在< body>内匹配多个字词标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要在文档的标签内匹配任何搜索词(或搜索词列表)。我当前的解决方案使用preg(在Joomla插件中)。

  $ pattern ='/ matchthisterm / i'; 
$ article-> text = preg_replace($ pattern,< span class = \highlight \> \\ 0< / span>,$ article-> text);

但是这代替了文档HTML中的所有内容,所以我需要首先匹配标签。这是甚至实现这一目标的最好方法吗?

编辑:
好​​的,我使用了simplehtmldom,但只是需要一些帮助才能得到正确的术语。到目前为止,我得到了:

  $ pattern ='/(matchthisterm)/ i'; 
$ html = str_get_html($ buffer);
$ es = $ html-> find('text');
foreach($ es为$ term){
//匹配文本节点中的术语
if(preg_match($ pattern,$ term-> plaintext)){
$ term-> outertext ='< span class =highlight>'。 $ term-> outertext。 < /跨度>;




$ b $ p $这使整个节点文本加粗,我没事在这里使用preg_replace?



解决方案:

  //获取HTML并查看文本节点
$ html = str_get_html($ buffer);
$ es = $ html-> find('text');
foreach($ es as $ term){
//匹配文本节点中的术语
$ term-> outertext = str_ireplace('matchthis','< span class = highlight> matchthis< / span>',$ term-> outertext);

$ / code>


解决方案

[HT] ML与正则表达式在很大程度上是灾难性的。在你的例子中,最简单的情况是这个输入:

 < a href =/ foo / matchthisterm / bar> BOF< / A> 

给出的输出非常彻底:

 < a href =/ foo /< span class =highlight> matchthisterm< / span> / bar> bof< / a> 

正确的做法是使用适当的HTML / XML解析器(例如 DOMDocument.loadHTML simplehtmldom ),然后分别扫描并替换每个文本节点的内容。最后,将HTML重新保存为字符串。



搜索字词突出显示的替代方法是在JavaScript中执行此操作。由于浏览器已经将HTML解析为DOM,这为您节省了一个处理步骤。见例如。 这个问题为例。


I've want to match any occurrence of a search term (or list of search terms) within the tags of a document. My current solution uses preg (within a Joomla plugin)

$pattern = '/matchthisterm/i';
$article->text = preg_replace($pattern,"<span class=\"highlight\">\\0</span>",$article->text);

But this replaces everything within the HTML of the document so I need to match the tags first. Is this even the best way to achieve this?

EDIT: OK, I've used simplehtmldom, but just need some help getting to the correct term. So far I've got:

$pattern = '/(matchthisterm)/i';
$html = str_get_html($buffer);
$es = $html->find('text');
foreach ($es as $term) {
    //Match to the terms within the text nodes 
    if (preg_match($pattern, $term->plaintext)) {
        $term->outertext = '<span class="highlight">' . $term->outertext . '</span>';
    }
}

This makes the entire node text bold, am I ok to use the preg_replace in here?

SOLUTION:

//Get the HTML and look at the text nodes
$html = str_get_html($buffer);
$es = $html->find('text');
foreach ($es as $term) {
    //Match to the terms within the text nodes
    $term->outertext = str_ireplace('matchthis', '<span class="highlight">matchthis</span>',         $term->outertext);
}

解决方案

No, processing [X][HT]ML with regex is largely disastrous. In the simplest case for your example, this input:

<a href="/foo/matchthisterm/bar">bof</a>

gives quite thoroughly broken output:

<a href="/foo/<span class="highlight">matchthisterm</span>/bar">bof</a>

The proper way to do it would be to use a proper HTML/XML parser (for example DOMDocument.loadHTML or simplehtmldom), then scan and replace the contents of each text node separately. Finally re-save the HTML back to a string.

An alternative for search term highlighting is to do it in JavaScript. Since the browser has already parsed the HTML to a DOM, that saves you a processing step. See eg. this question for an example.

这篇关于在&lt; body&gt;内匹配多个字词标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆