PHP 在搜索短语周围获取 10 个单词 [英] PHP get 10 words around a search phrase

查看:41
本文介绍了PHP 在搜索短语周围获取 10 个单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试执行以下操作:

I am trying to do the following :

从文本块中抓取搜索短语之前的 5 个单词(如果只有 Y 个单词,则为 Y)和搜索短语之后的 5 个单词(如果那里只有 Y 个单词,则为 Y)(当我说单词时)意思是文本块中的任何单词或数字)

grab 5 words before the search phrase (or Y if there is only Y words there) and 5 words after the search phrase (or Y if there is only Y words there) from a block of text (when I say words I mean words or numbers whatever is in the block of text)

例如

文本块:欢迎使用 Stack Overflow!访问您的用户页面以设置您的姓名和电子邮件."

The block of text: "Welcome to Stack Overflow! Visit your user page to set your name and email."

如果您要搜索访问您的",它将返回:欢迎使用 Stack Overflow!访问您的用户页面以设置您的"

if you was to search "visit your" it would return: "Welcome to Stack Overflow! Visit your user page to set your"

我试过用这个

$preg_safe = str_replace(" ", "\s", preg_quote($search)); 
$pattern = "/(\w*\S\s+){0,8}\S*\b($preg_safe)\b\S*(\s\S+){0,8}/ix";
if(preg_match_all($pattern, $full_text, $matches))
{ 
    $result = str_replace(strtolower($search), "<span class='searched-for'>$search</span>", strtolower($matches[0][0])); 
}
else
{ 
    $result = false; 
}

如果搜索短语是英语,它就可以工作,但我也需要它在其他语言中工作.例如,它不适用于希伯来语搜索短语.

And it works if the search phrase is in English, but I need it to work in other languages as well. It doesn't work for an Hebrew search phrase for example.

我尝试将模式更改为:

$pattern = "(*UTF8)/(\w*\S\s+){0,8}\S*\b($preg_safe)\b\S*(\s\S+){0,8}/i";

但是没有用.

如何使其适用于其他语言?

How can I make it work for other languages?

//////////////////编辑//////////

////////////////// EDIT //////////

正如 enrico.bacis 所建议的 - 我已将模式更改为:

As enrico.bacis suggested - I've changed the pattern to :

$pattern = "/(\w\p{Hebrew}*\S\s+){0,20}\S*\b($preg_safe)\b\S*(\s\S+){0,20}/ixu";

现在它适用于英语和希伯来语搜索短语,但当有特殊字符(例如')时,结果文本会被剪切.

Now it works for English and Hebrew search phrases, but the result text is being cut when there is a special character (' for example).

如何让模式返回搜索短语周围的文本,即使它包含特殊字符?

How can I make the pattern return the text around the search phrase even if it contains special characters?

推荐答案

你的问题是 \w 不匹配希伯来字符,实际上 \w 是只是一个所谓的单词"字符的快捷方式:[A-Za-z0-9_].

Your problem is on the \w that is not matching Hebrew characters, in fact \w is just a shortcut for a so-called "word" character: [A-Za-z0-9_].

要使正则表达式也能够捕获希伯来语字符,您只需进行两项更改:

To make a regex able to capture also Hebrew characters you need only to make two changes:

  • 添加u到修饰符来管理UTF8字符(这样你的修饰符就是/ixu)

  • Add u to the modifier to manage UTF8 characters (so your modifier will be /ixu)

[\w\p{Hebrew}] 替换为模式中每次出现的 \w.

Replace [\w\p{Hebrew}] for every occurrence of \w in your pattern.

您还可以在此处查看有关此主题的更多答案.

You can also check here for more answers on this topic.

这篇关于PHP 在搜索短语周围获取 10 个单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆