在< body>内匹配多个字词标签 [英] Match multiple terms within <body> tags
问题描述
我想要在文档的标签内匹配任何搜索词(或搜索词列表)。我当前的解决方案使用preg(在Joomla插件中)。
$ pattern ='/ matchthisterm / i';
$ article-> text = preg_replace($ pattern,< span class = \highlight \> \\ 0< / span>,$ article-> text);
但是这代替了文档HTML中的所有内容,所以我需要首先匹配标签。这是甚至实现这一目标的最好方法吗?
编辑:
好的,我使用了simplehtmldom,但只是需要一些帮助才能得到正确的术语。到目前为止,我得到了:
$ pattern ='/(matchthisterm)/ i';
$ html = str_get_html($ buffer);
$ es = $ html-> find('text');
foreach($ es为$ term){
//匹配文本节点中的术语
if(preg_match($ pattern,$ term-> plaintext)){
$ term-> outertext ='< span class =highlight>'。 $ term-> outertext。 < /跨度>;
$ b $ p $这使整个节点文本加粗,我没事在这里使用preg_replace?
解决方案:
//获取HTML并查看文本节点
$ html = str_get_html($ buffer);
$ es = $ html-> find('text');
foreach($ es as $ term){
//匹配文本节点中的术语
$ term-> outertext = str_ireplace('matchthis','< span class = highlight> matchthis< / span>',$ term-> outertext);
$ / code>
解决方案 [HT] ML与正则表达式在很大程度上是灾难性的。在你的例子中,最简单的情况是这个输入:
< a href =/ foo / matchthisterm / bar> BOF< / A>
给出的输出非常彻底:
< a href =/ foo /< span class =highlight> matchthisterm< / span> / bar> bof< / a>
正确的做法是使用适当的HTML / XML解析器(例如 DOMDocument.loadHTML 或 simplehtmldom ),然后分别扫描并替换每个文本节点的内容。最后,将HTML重新保存为字符串。
搜索字词突出显示的替代方法是在JavaScript中执行此操作。由于浏览器已经将HTML解析为DOM,这为您节省了一个处理步骤。见例如。 这个问题为例。
I've want to match any occurrence of a search term (or list of search terms) within the tags of a document. My current solution uses preg (within a Joomla plugin)
$pattern = '/matchthisterm/i';
$article->text = preg_replace($pattern,"<span class=\"highlight\">\\0</span>",$article->text);
But this replaces everything within the HTML of the document so I need to match the tags first. Is this even the best way to achieve this?
EDIT:
OK, I've used simplehtmldom, but just need some help getting to the correct term. So far I've got:
$pattern = '/(matchthisterm)/i';
$html = str_get_html($buffer);
$es = $html->find('text');
foreach ($es as $term) {
//Match to the terms within the text nodes
if (preg_match($pattern, $term->plaintext)) {
$term->outertext = '<span class="highlight">' . $term->outertext . '</span>';
}
}
This makes the entire node text bold, am I ok to use the preg_replace in here?
SOLUTION:
//Get the HTML and look at the text nodes
$html = str_get_html($buffer);
$es = $html->find('text');
foreach ($es as $term) {
//Match to the terms within the text nodes
$term->outertext = str_ireplace('matchthis', '<span class="highlight">matchthis</span>', $term->outertext);
}
解决方案 No, processing [X][HT]ML with regex is largely disastrous. In the simplest case for your example, this input:
<a href="/foo/matchthisterm/bar">bof</a>
gives quite thoroughly broken output:
<a href="/foo/<span class="highlight">matchthisterm</span>/bar">bof</a>
The proper way to do it would be to use a proper HTML/XML parser (for example DOMDocument.loadHTML or simplehtmldom), then scan and replace the contents of each text node separately. Finally re-save the HTML back to a string.
An alternative for search term highlighting is to do it in JavaScript. Since the browser has already parsed the HTML to a DOM, that saves you a processing step. See eg. this question for an example.
这篇关于在< body>内匹配多个字词标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!