突出显示段落中的关键字 [英] Highlight keywords in a paragraph

查看:141
本文介绍了突出显示段落中的关键字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在段落中突出显示一个关键字,就像google在其搜索结果中一样.假设我有一个带有博客文章的MySQL数据库.当用户搜索某个关键字时,我希望返回包含这些关键字的帖子,但只显示部分帖子(包含搜索关键字的段落)并突出显示那些关键字.

I need to highlight a keyword in a paragraph, as google does in its search results. Let's assume that I have a MySQL db with blog posts. When a user searches for a certain keyword I wish to return the posts which contain those keywords, but to show only parts of the posts (the paragraph which contain the searched keyword) and to highlight those keywords.

我的计划是这样

  • 找到内容中包含搜索关键字的帖子ID;
  • 再次阅读该帖子的内容,并将每个单词放入固定的缓冲区数组(50个单词)中,直到找到关键字为止.

您能为我提供一些逻辑上的帮助,还是至少告诉我我的逻辑是否可以?我正处于PHP学习阶段.

Can you help me with some logic, or at least to tell my if my logic is ok? I'm in a PHP learning stage.

推荐答案

如果包含html(请注意,这是一个非常强大的解决方案):

If it contains html (note that this is a pretty robust solution):

$string = '<p>foo<b>bar</b></p>';
$keyword = 'foo';
$dom = new DomDocument();
$dom->loadHtml($string);
$xpath = new DomXpath($dom);
$elements = $xpath->query('//*[contains(.,"'.$keyword.'")]');
foreach ($elements as $element) {
    foreach ($element->childNodes as $child) {
        if (!$child instanceof DomText) continue;
        $fragment = $dom->createDocumentFragment();
        $text = $child->textContent;
        $stubs = array();
        while (($pos = stripos($text, $keyword)) !== false) {
            $fragment->appendChild(new DomText(substr($text, 0, $pos)));
            $word = substr($text, $pos, strlen($keyword));
            $highlight = $dom->createElement('span');
            $highlight->appendChild(new DomText($word));
            $highlight->setAttribute('class', 'highlight');
            $fragment->appendChild($highlight);
            $text = substr($text, $pos + strlen($keyword));
        }
        if (!empty($text)) $fragment->appendChild(new DomText($text));
        $element->replaceChild($fragment, $child);
    }
}
$string = $dom->saveXml($dom->getElementsByTagName('body')->item(0)->firstChild);

结果:

<p><span class="highlight">foo</span><b>bar</b></p>

并且:

$string = '<body><p>foobarbaz<b>bar</b></p></body>';
$keyword = 'bar';

您得到了(为了便于阅读,将其分成多行):

You get (broken onto multiple lines for readability):

<p>foo
    <span class="highlight">bar</span>
    baz
    <b>
        <span class="highlight">bar</span>
    </b>
</p>

提防非圆顶解决方案(例如regexstr_replace),因为突出显示诸如"div"之类的东西有完全破坏HTML的趋势……这只会在正文中突出显示"字符串,永远不会在标签内...

Beware of non-dom solutions (like regex or str_replace) since highlighting something like "div" has a tendency of completely destroying your HTML... This will only ever "highlight" strings in the body, never inside of a tag...

修改:由于您希望获得Google风格的结果,因此,这是一种实现方式:

Edit Since you want Google style results, here's one way of doing it:

function getKeywordStubs($string, array $keywords, $maxStubSize = 10) {
    $dom = new DomDocument();
    $dom->loadHtml($string);
    $xpath = new DomXpath($dom);
    $results = array();
    $maxStubHalf = ceil($maxStubSize / 2);
    foreach ($keywords as $keyword) {
        $elements = $xpath->query('//*[contains(.,"'.$keyword.'")]');
        $replace = '<span class="highlight">'.$keyword.'</span>';
        foreach ($elements as $element) {
            $stub = $element->textContent;
            $regex = '#^.*?((\w*\W*){'.
                 $maxStubHalf.'})('.
                 preg_quote($keyword, '#').
                 ')((\w*\W*){'.
                 $maxStubHalf.'}).*?$#ims';
            preg_match($regex, $stub, $match);
            var_dump($regex, $match);
            $stub = preg_replace($regex, '\\1\\3\\4', $stub);
            $stub = str_ireplace($keyword, $replace, $stub);
            $results[] = $stub;
        }
    }
    $results = array_unique($results);
    return $results;
}

好吧,所以要做的是返回一个匹配的数组,其周围带有$maxStubSize个单词(即,该数字最多之前和之后一半)...

Ok, so what that does is return an array of matches with $maxStubSize words around it (namely up to half that number before, and half after)...

因此,给定一个字符串:

So, given a string:

<p>a whole 
    <b>bunch of</b> text 
    <a>here for</a> 
    us to foo bar baz replace out from this string
    <b>bar</b>
</p>

呼叫getKeywordStubs($string, array('bar', 'bunch'))将导致:

array(4) {
  [0]=>
  string(75) "here for us to foo <span class="highlight">bar</span> baz replace out from "
  [3]=>
  string(34) "<span class="highlight">bar</span>"
  [4]=>
  string(62) "a whole <span class="highlight">bunch</span> of text here for "
  [7]=>
  string(39) "<span class="highlight">bunch</span> of"
}

因此,您可以通过按strlen对列表进行排序,然后选择两个最长的匹配项来构建结果blurb(假设php 5.3 +):

So, then you could build your result blurb by sorting the list by strlen and then picking the two longest matches... (assuming php 5.3+):

usort($results, function($str1, $str2) { 
    return strlen($str2) - strlen($str1);
});
$description = implode('...', array_slice($results, 0, 2));

这将导致:

here for us to foo <span class="highlight">bar</span> baz replace out...a whole <span class="highlight">bunch</span> of text here for 

我希望能有所帮助...(我确实觉得这有点......肿...我敢肯定有更好的方法可以做到这一点,但这是一种方法)...

I hope that helps... (I do feel this is a bit... bloated... I'm sure there are better ways to do this, but here's one way)...

这篇关于突出显示段落中的关键字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆